Prompts for

OpenAI Codex

Comprehensive Missing Data Handling

Identify and highlight all missing values in this dataset. Fill missing numerical values using the median of the column. For missing text values, replace with 'Not Available'. Suggest the most likely values for missing entries based on existing patterns.

Duplicate Detection and Removal

Find and delete duplicate rows while keeping the first occurrence. Identify near-duplicate records (e.g., customer names or emails with minor spelling differences) and suggest merging strategies. Remove duplicate entries while preserving the most recent record.

Outlier Detection and Treatment

Detect outliers in the 'income' column using the Z-score method. Replace values with a Z-score greater than 3 with the column median. Ensure all values in the 'age' column are non-negative and remove any invalid entries.

Data Type Consistency and Encoding

Convert the 'gender' column to a categorical type. Apply one-hot encoding to the 'location' column. Normalize 'feature1' and 'feature2' using standard scaling.

Structured Exploratory Data Analysis Planning

I have JSON-formatted survey data with feedback from different technology courses (Python, SQL, R, etc.). Help me identify significant differences in sentiment, strengths, and weaknesses across these course types. Focus your analysis on: 1. Which technology has the highest overall satisfaction and why? 2. Are there common weaknesses that appear across multiple technologies? 3. Do completion rates correlate with overall ratings? 4. What are the unique strengths of each technology course? Provide your analysis in a structured format with headings for each question, and include specific evidence from the data to support your findings.

Detect and Handle Missing Data

Identify and highlight all missing values in this dataset. Then, fill missing numerical values using the median of the column, and replace all missing text values with 'Not Available'. If possible, suggest the most likely values for missing entries based on existing patterns.

Remove and Merge Duplicate Records

Find and delete duplicate rows in this dataset while keeping the first occurrence. Also, identify near-duplicate customer records and suggest merging strategies. Highlight rows where the same email appears more than once, and find duplicate product names with slight spelling variations.

Step-by-Step Data Discrepancy Cleaning Plan

I have a CSV file [insert CSV file] containing sales transaction data from multiple store locations (columns: TransactionID, StoreID, SaleDate, ProductID, Quantity, and Price). Some rows have missing or incorrect StoreIDs, and some of the prices look off. Please outline a step-by-step approach to identify and handle these discrepancies, and provide sample Python code for cleaning tasks like removing or imputing missing StoreIDs and fixing price outliers.

AI-Assisted Data Analysis Planning

I have JSON-formatted survey data with feedback from different technology courses (Python, SQL, R, etc.). Help me identify significant differences in sentiment, strengths, and weaknesses across these course types. Focus your analysis on: 1. Which technology has the highest overall satisfaction and why? 2. Are there common weaknesses that appear across multiple technologies? 3. Do completion rates correlate with overall ratings? 4. What are the unique strengths of each technology course? Provide your analysis in a structured format with headings for each question, and include specific evidence from the data to support your findings.

Comprehensive Data Cleaning with Python

Given a dataset, write Python code using pandas to: 1. Fill missing numerical values with the column median. 2. Replace outliers in the 'income' column (using Z-score > 3) with the median. 3. Normalize 'feature1' and 'feature2' using StandardScaler. 4. Convert the 'gender' column to categorical type. 5. One-hot encode the 'location' column. 6. Remove rows where 'age' is negative.

Comprehensive Missing Data Handling

Identify and highlight all missing values in this dataset. Fill missing numerical values using the median of the column. For missing text values, replace with 'Not Available'. Suggest the most likely values for missing entries based on existing patterns.

Duplicate Detection and Removal

Find and delete duplicate rows while keeping the first occurrence. Identify near-duplicate records (e.g., customer names or emails with minor spelling differences) and suggest merging strategies. Remove duplicate entries while preserving the most recent record.

Outlier Detection and Treatment

Detect outliers in the 'income' column using the Z-score method. Replace values with a Z-score greater than 3 with the column median. Ensure all values in the 'age' column are non-negative and remove any invalid entries.

Data Type Consistency and Encoding

Convert the 'gender' column to a categorical type. Apply one-hot encoding to the 'location' column. Normalize 'feature1' and 'feature2' using standard scaling.

Structured Exploratory Data Analysis Planning

I have JSON-formatted survey data with feedback from different technology courses (Python, SQL, R, etc.). Help me identify significant differences in sentiment, strengths, and weaknesses across these course types. Focus your analysis on: 1. Which technology has the highest overall satisfaction and why? 2. Are there common weaknesses that appear across multiple technologies? 3. Do completion rates correlate with overall ratings? 4. What are the unique strengths of each technology course? Provide your analysis in a structured format with headings for each question, and include specific evidence from the data to support your findings.

Detect and Handle Missing Data

Identify and highlight all missing values in this dataset. Then, fill missing numerical values using the median of the column, and replace all missing text values with 'Not Available'. If possible, suggest the most likely values for missing entries based on existing patterns.

Remove and Merge Duplicate Records

Find and delete duplicate rows in this dataset while keeping the first occurrence. Also, identify near-duplicate customer records and suggest merging strategies. Highlight rows where the same email appears more than once, and find duplicate product names with slight spelling variations.

Step-by-Step Data Discrepancy Cleaning Plan

I have a CSV file [insert CSV file] containing sales transaction data from multiple store locations (columns: TransactionID, StoreID, SaleDate, ProductID, Quantity, and Price). Some rows have missing or incorrect StoreIDs, and some of the prices look off. Please outline a step-by-step approach to identify and handle these discrepancies, and provide sample Python code for cleaning tasks like removing or imputing missing StoreIDs and fixing price outliers.

AI-Assisted Data Analysis Planning

I have JSON-formatted survey data with feedback from different technology courses (Python, SQL, R, etc.). Help me identify significant differences in sentiment, strengths, and weaknesses across these course types. Focus your analysis on: 1. Which technology has the highest overall satisfaction and why? 2. Are there common weaknesses that appear across multiple technologies? 3. Do completion rates correlate with overall ratings? 4. What are the unique strengths of each technology course? Provide your analysis in a structured format with headings for each question, and include specific evidence from the data to support your findings.

Comprehensive Data Cleaning with Python

Given a dataset, write Python code using pandas to: 1. Fill missing numerical values with the column median. 2. Replace outliers in the 'income' column (using Z-score > 3) with the median. 3. Normalize 'feature1' and 'feature2' using StandardScaler. 4. Convert the 'gender' column to categorical type. 5. One-hot encode the 'location' column. 6. Remove rows where 'age' is negative.