Reading a CSV File
Load real-world data from CSV, Excel, and other formats
Knowledge Debt detected
You can study this freely — but your score may plateau if these foundations have gaps. The Mastery badge requires them to be solid.
Explanation
The most common way to get data into pandas is pd.read_csv().
df = pd.read_csv('data.csv')Useful parameters:
```python # Specify which column to use as index df = pd.read_csv('data.csv', index_col='id')
# Parse dates automatically df = pd.read_csv('data.csv', parse_dates=['date'])
# Read only certain columns df = pd.read_csv('data.csv', usecols=['name', 'age', 'salary'])
# Handle missing value markers df = pd.read_csv('data.csv', na_values=['N/A', 'none', '-'])
# Limit rows read (useful for large files) df = pd.read_csv('data.csv', nrows=1000) ```
Other formats:
pd.read_excel('file.xlsx')pd.read_json('file.json')pd.read_sql(query, connection)
Saving back:
python df.to_csv('output.csv', index=False) # index=False avoids writing the row numbers
Examples
Load, peek, and check
Always check shape and info() after loading
import pandas as pd
df = pd.read_csv('titanic.csv')
print(df.shape) # (891, 12)
print(df.head())
print(df.info()) # shows nulls per column
# Save a cleaned version
df.to_csv('titanic_clean.csv', index=False)Next in pandas
Selecting Columns