Skip to content

Data Analysis Workflow

Comprehensive data analysis workflows using AI assistance for data exploration, statistical analysis, visualization, machine learning, and reporting across various data formats and domains.

  • Rawi configured with a suitable provider
  • Data files (CSV, JSON, Excel, databases)
  • Basic understanding of data analysis concepts
  • Python/R environment (optional for advanced analysis)

Initial data understanding and quality assessment:

Terminal window
# Explore CSV data structure
rawi ask --file sales-data.csv "Analyze this dataset structure, data types, missing values, and provide initial insights"
# Profile dataset quality
rawi ask --file customer-data.xlsx "Perform comprehensive data quality assessment including duplicates, outliers, and inconsistencies"
# Multi-file data exploration
rawi ask --files sales.csv customers.csv products.csv "Analyze relationships between these datasets and suggest join strategies"

Perform comprehensive statistical analysis:

Terminal window
# Descriptive statistics
rawi ask --file survey-data.csv --act data-analyst "Generate comprehensive descriptive statistics and identify key patterns"
# Correlation analysis
rawi ask --file financial-data.csv --act statistician "Perform correlation analysis and identify significant relationships between variables"
# Hypothesis testing
rawi ask --file experiment-results.csv --act statistician "Perform appropriate statistical tests to validate experimental hypotheses"

Generate visualization strategies and code:

Terminal window
# Visualization strategy
rawi ask --file time-series-data.csv --act data-visualizer "Recommend optimal visualization strategies for this time series data"
# Python visualization code
rawi ask --file sales-data.csv --act data-scientist "Generate Python code using matplotlib/seaborn for comprehensive data visualization"
# Dashboard specifications
rawi ask --file kpi-data.csv --act dashboard-designer "Design dashboard layout and visualizations for business KPIs"

Apply machine learning techniques:

Terminal window
# ML model recommendations
rawi ask --file customer-churn.csv --act machine-learning-engineer "Recommend appropriate ML models for customer churn prediction"
# Feature engineering
rawi ask --file raw-features.csv --act data-scientist "Suggest feature engineering techniques to improve model performance"
# Model evaluation
rawi ask --file model-results.csv --act ml-engineer "Analyze model performance metrics and suggest improvements"

Comprehensive data analysis automation:

#!/bin/bash
# data-analysis-pipeline.sh - Comprehensive data analysis workflow
set -e
DATA_FILE=${1:-"data.csv"}
ANALYSIS_TYPE=${2:-"exploratory"} # exploratory, statistical, ml, business
OUTPUT_DIR="analysis_$(date +%Y%m%d_%H%M%S)"
echo "📊 Starting data analysis for $DATA_FILE"
# Create analysis directory structure
mkdir -p "$OUTPUT_DIR"/{reports,visualizations,insights,code,documentation}
# 1. Data profiling and quality assessment
echo "🔍 Performing data profiling..."
rawi ask --file "$DATA_FILE" --act data-analyst "
Perform comprehensive data profiling including:
- Dataset structure and dimensions
- Data types and formats
- Missing value analysis
- Duplicate record detection
- Outlier identification
- Data quality score
- Summary statistics
- Initial observations and concerns
" > "$OUTPUT_DIR/reports/data-profile.md"
# 2. Exploratory data analysis
echo "📈 Conducting exploratory data analysis..."
rawi ask --file "$DATA_FILE" --act data-scientist "
Conduct thorough exploratory data analysis including:
- Distribution analysis for all variables
- Correlation matrix and relationships
- Trend identification and patterns
- Seasonal patterns (if time-based)
- Categorical variable analysis
- Numerical variable distributions
- Potential data transformations needed
- Business insights and opportunities
" > "$OUTPUT_DIR/reports/eda-report.md"
echo "✅ Data analysis complete!"
echo "📊 Analysis results saved to: $OUTPUT_DIR"
Terminal window
# Financial performance analysis
rawi ask --file financial-statements.xlsx --act financial-analyst "
Analyze financial performance including:
- Profitability ratios and trends
- Liquidity and solvency analysis
- Cash flow analysis
- Revenue growth patterns
- Cost structure optimization
- Investment efficiency metrics
- Risk assessment indicators
"
# Portfolio analysis
rawi ask --file portfolio-data.csv --act investment-analyst "
Perform portfolio analysis including:
- Asset allocation assessment
- Risk-return analysis
- Diversification evaluation
- Performance attribution
- Benchmark comparison
- Optimization recommendations
"
Terminal window
# Campaign performance analysis
rawi ask --file campaign-data.csv --act marketing-analyst "
Analyze marketing campaign performance including:
- ROI and ROAS calculations
- Channel effectiveness comparison
- Customer acquisition costs
- Conversion funnel analysis
- Audience segmentation insights
- Attribution modeling
- Budget optimization recommendations
"
# Customer behavior analysis
rawi ask --file customer-journey.csv --act customer-analyst "
Analyze customer behavior patterns including:
- Purchase journey mapping
- Churn prediction indicators
- Lifetime value calculations
- Segmentation analysis
- Behavioral triggers identification
- Retention strategy recommendations
"
Terminal window
# Process efficiency analysis
rawi ask --file operations-data.csv --act operations-analyst "
Analyze operational efficiency including:
- Process performance metrics
- Bottleneck identification
- Resource utilization analysis
- Quality metrics assessment
- Cost optimization opportunities
- Productivity improvements
- Automation potential
"
# Supply chain analysis
rawi ask --file supply-chain.csv --act supply-chain-analyst "
Analyze supply chain performance including:
- Inventory optimization
- Supplier performance metrics
- Demand forecasting accuracy
- Lead time analysis
- Cost structure evaluation
- Risk assessment
"
Terminal window
# Time series forecasting
rawi ask --file time-series.csv --act time-series-analyst "
Perform comprehensive time series analysis including:
- Trend and seasonality decomposition
- Stationarity testing
- ARIMA modeling recommendations
- Forecasting accuracy assessment
- Anomaly detection
- Intervention analysis
- Business cycle identification
"
# Predictive modeling
rawi ask --file historical-data.csv --act predictive-analyst "
Develop predictive models including:
- Feature importance analysis
- Model selection and validation
- Hyperparameter optimization
- Cross-validation strategies
- Performance benchmarking
- Deployment considerations
"
Terminal window
# Sentiment analysis
rawi ask --file reviews-data.csv --act nlp-analyst "
Perform text analytics including:
- Sentiment analysis and scoring
- Topic modeling and themes
- Keyword extraction
- Text classification
- Emotion detection
- Content recommendations
- Brand perception analysis
"
# Document analysis
rawi ask --file survey-responses.txt --act text-analyst "
Analyze text data including:
- Thematic analysis
- Content categorization
- Frequency analysis
- Readability assessment
- Language pattern identification
- Summary generation
"
Terminal window
# Dashboard specification
rawi ask --file kpi-data.csv --act dashboard-designer "
Design comprehensive dashboard including:
- KPI visualization layout
- Interactive filter components
- Drill-down capabilities
- Real-time data integration
- Mobile responsiveness
- User access controls
- Export functionality
"
# Visualization best practices
rawi ask --file complex-data.csv --act data-visualizer "
Recommend visualization best practices including:
- Chart type selection
- Color scheme optimization
- Layout and composition
- Accessibility considerations
- Interactive elements
- Performance optimization
- User experience design
"
Terminal window
# Automated reporting pipeline
rawi ask --file report-data.csv --act reporting-analyst "
Design automated reporting system including:
- Report template design
- Data refresh schedules
- Distribution mechanisms
- Quality validation checks
- Exception handling
- Performance monitoring
- Version control
"
Terminal window
# Comprehensive data quality audit
rawi ask --batch "data/**/*.csv" --act data-quality-analyst "
Perform data quality assessment including:
- Completeness analysis
- Accuracy validation
- Consistency checks
- Timeliness evaluation
- Validity assessment
- Uniqueness verification
- Integrity validation
"
# Data governance framework
rawi ask --act data-governance-expert "
Design data governance framework including:
- Data ownership definitions
- Quality standards and metrics
- Access control policies
- Data lifecycle management
- Compliance requirements
- Audit procedures
- Training programs
"
Terminal window
# A/B testing analysis
rawi ask --file ab-test-results.csv --act experimental-designer "
Analyze A/B testing results including:
- Statistical significance testing
- Effect size calculations
- Power analysis
- Confidence intervals
- Multiple testing corrections
- Practical significance assessment
- Recommendation formulation
"
# Causal inference
rawi ask --file observational-data.csv --act causal-analyst "
Perform causal analysis including:
- Confounding variable identification
- Instrumental variable analysis
- Propensity score matching
- Difference-in-differences
- Regression discontinuity
- Sensitivity analysis
"
  • Validate data sources and collection methods
  • Check for missing values and outliers
  • Ensure data consistency and accuracy
  • Document data transformations
  • Implement quality monitoring
  • Choose appropriate statistical methods
  • Validate assumptions
  • Use proper significance levels
  • Consider multiple testing corrections
  • Report confidence intervals
  • Choose appropriate chart types
  • Use clear and intuitive designs
  • Ensure accessibility
  • Provide proper context
  • Enable interactivity when useful
  • Document analysis procedures
  • Version control code and data
  • Use reproducible environments
  • Provide clear instructions
  • Share code and methodologies
# Python analysis template
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# Load and explore data
df = pd.read_csv('data.csv')
print(df.info())
print(df.describe())
# Generated by Rawi analysis recommendations
# [Insert AI-generated analysis code here]
# R analysis template
library(dplyr)
library(ggplot2)
library(tidyr)
library(corrplot)
# Load and explore data
data <- read.csv('data.csv')
summary(data)
str(data)
# Generated by Rawi analysis recommendations
# [Insert AI-generated analysis code here]