Back to Python

Data Analysis with Python

Python Data Analysis Ecosystem

Python offers a powerful suite of libraries for data analysis, manipulation, and visualization. These tools form the foundation of modern data science workflows.

# Core data analysis libraries:
import numpy as np # Numerical computing
import pandas as pd # Data manipulation
import matplotlib.pyplot as plt # Visualization
import seaborn as sns # Statistical visualization

# Typical workflow:
# 1. Load data → 2. Clean data → 3. Explore data
# 4. Analyze data → 5. Visualize results

These libraries work together seamlessly to handle everything from simple data exploration to complex statistical analysis.

Pandas Fundamentals

Pandas provides DataFrame objects for efficient data manipulation with integrated indexing.

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)

# Basic operations
df.head() # First 5 rows
df.info() # DataFrame info
df.describe() # Statistical summary

# Selecting data
df['Name'] # Single column
df.loc[0] # Row by label
df.iloc[0] # Row by position

# Filtering
df[df['Age'] > 30] # People older than 30

Data Cleaning with Pandas

Real-world data is often messy. Pandas provides tools to handle missing data, duplicates, and inconsistencies.

# Handling missing data
df.isna().sum() # Count missing values
df.dropna() # Drop rows with missing values
df.fillna(0) # Fill missing with 0
df.fillna(df.mean()) # Fill with mean

# Removing duplicates
df.drop_duplicates()

# Data type conversion
df['Age'] = df['Age'].astype('float')

# String operations
df['Name'].str.upper() # Convert to uppercase
df['Name'].str.contains('Ali') # Find names containing 'Ali'

# DateTime handling
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year

Data Aggregation & Grouping

Pandas provides powerful tools for grouping and aggregating data to extract insights.

# Grouping data
grouped = df.groupby('City')
grouped.mean() # Mean of each numeric column by city

# Multiple aggregations
df.groupby('City')['Age'].agg(['mean', 'min', 'max', 'count'])

# Pivot tables
pd.pivot_table(df, values='Age', index='City', aggfunc=np.mean)

# Cross tabulation
pd.crosstab(df['City'], df['Age' > 30])

# Merging DataFrames
pd.merge(df1, df2, on='key') # SQL-style join
pd.concat([df1, df2]) # Stack vertically

Data Visualization

Visualizations help uncover patterns and communicate findings effectively.

# Matplotlib basics
plt.plot(df['Age'])
plt.title('Age Distribution')
plt.xlabel('Index')
plt.ylabel('Age')
plt.show()

# Seaborn for statistical plots
sns.histplot(df['Age'])
sns.boxplot(x='City', y='Age', data=df)
sns.scatterplot(x='Age', y='Income', hue='City', data=df)

# Pandas built-in plotting
df.plot(kind='bar', x='Name', y='Age')
df['Age'].plot(kind='hist')

# Advanced visualizations
sns.pairplot(df) # Scatter matrix
sns.heatmap(df.corr(), annot=True) # Correlation matrix

Advanced Analysis Techniques

Python offers powerful tools for statistical analysis and machine learning.

# Statistical analysis with scipy
from scipy import stats
stats.ttest_ind(df[df['City']=='New York']['Age'],
            df[df['City']=='London']['Age'])

# Linear regression with statsmodels
import statsmodels.api as sm
X = sm.add_constant(df['Age'])
model = sm.OLS(df['Income'], X).fit()
print(model.summary())

# Machine learning with scikit-learn
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

X = df[['Age']]
y = df['Income']
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = LinearRegression().fit(X_train, y_train)
predictions = model.predict(X_test)

Python Data Analysis Videos

Master Python data analysis with these handpicked YouTube tutorials:

Pandas Mastery

Learn data manipulation with Pandas:

Data Visualization

Creating insightful visualizations:

Advanced Analysis

Statistical and machine learning techniques:

Real-World Projects

End-to-end data analysis projects:

Python Data Analysis Quiz