DISTRIBUTIONS¶
PARETO vs NORMAL DISTRIBUTION¶
One crucial thing to understand is that, in your life, especially as engineer, you will inevitably deal with data. You may dive deep into the world of data analytics—studying concepts like mean, variance, skewness, and kurtosis. However, the main point is to be aware of what the data can and wants to tell you. You cannot cherry-pick by selecting only the information that suits your perspective.
For example, some people argue that we are (or were) in a long period of peace, suggesting that human nature has become less violent. Does the data truly support this theory? Maybe it does, but data can often be manipulated or misinterpreted to support almost any viewpoint, and there are studies that support the opposite.
This course is not about statistics or probability, but I want to provide you with some foundational knowledge to help you understand that you must approach data with a critical mind. It’s essential to question and reason about data, using your understanding of the world to interpret it accurately.
As Nassim Taleb explained, our data typically falls into one of two categories: Mediocristan or Extremistan.
Mediocristan represents data that follows a Gaussian (normal) distribution. Examples include IQ, height, test scores, and blood pressure.
In this domain, individual variations are limited, and outliers have minimal impact on the overall picture.Extremistan, on the other hand, characterizes data where a small number of extreme values dominate. Examples include wealth, sales, city populations, pandemics, and deaths in wars.
In this domain, rare and unpredictable events can drastically shape the outcome, making averages and predictions unreliable.
Recognizing whether your data belongs to Mediocristan or Extremistan helps ensure you apply the right tools and draw accurate, meaningful insights.
For example, it’s important to distinguish between outliers and extreme values:
Outlier: Represents a data point that is highly unlikely or practically impossible within the given context. It often indicates an error or an anomaly that can be excluded from the analysis.
Extreme Value: Represents a rare but possible event, occurring with a very small probability. Unlike outliers, extreme values must be carefully considered, as they often expose vulnerabilities in models and reveal important risks. Ignoring them can lead to misleading conclusions and poor decision-making.
Example: height and wealth
FOR MORE INTERESTING EXAMPLE YOU CAN CHECK: *Pasquale Cirillo* or *Nassim Taleb*
https://www.pasqualecirillo.eu/ --> content in Italian and English
import numpy as np
import matplotlib.pyplot as plt
a, m = 1.16, 10. # shape and mode
s = (np.random.pareto(a, 1000) + 1) * m
count, bins, _ = plt.hist(s, 100, density=True)
fit = a*m**a / bins**(a+1)
plt.plot(bins, max(count)*fit/max(fit), linewidth=2, color='r')
plt.show()
# Parameters for the Pareto distribution
alpha = 5
m = 10.0 # minimum value (scale parameter)
# Define the range for x (values of wealth)
x = np.linspace(m, 200, 500) # adjust the upper limit as needed
# Compute the CDF of the Pareto distribution
cdf = 1 - (m / x)**alpha
# Plot the CDF
plt.figure(figsize=(8, 5))
plt.plot(x, cdf, label=f'Pareto CDF (alpha={alpha}, m={m})', color='red')
plt.xlabel('Wealth (x)')
plt.ylabel('Cumulative Probability')
plt.title('Cumulative Distribution Function (CDF) of the Pareto Distribution')
plt.legend()
plt.grid(True)
plt.show()
def people_for_wealth_share(wealth_share, alpha, population=1):
"""
Compute the fraction (or number) of the population (richest individuals) that holds a given share
of the total wealth under a Pareto distribution.
The model assumes that the wealth share of the top fraction q is given by:
W_top(q) = q^(1 - 1/alpha)
Given a desired wealth share (wealth_share) (e.g., 0.90 for 90%), the corresponding fraction q
is computed as:
q = exp( ln(wealth_share) / (1 - 1/alpha) )
Parameters:
wealth_share: float
The desired fraction of total wealth (between 0 and 1) held by the top portion.
alpha: float
The Pareto distribution shape parameter.
population: int or float, optional
The total population. Defaults to 1, in which case the result is the fraction of people.
Returns:
float: The number (or fraction) of people holding the specified wealth share.
"""
if not (0 < wealth_share < 1):
raise ValueError("wealth_share must be between 0 and 1 (non-inclusive).")
exponent = 1 - 1 / alpha
q_fraction = np.exp(np.log(wealth_share) / exponent)
return q_fraction * population
# Example usage:
alpha = 1.5
wealth_share = 0.80 # % of total wealth (or whatever you want to analyze)
# If population is normalized to 1, it returns a fraction
fraction_people = people_for_wealth_share(wealth_share, alpha)
print(f"Fraction of people holding {wealth_share*100:.0f}% of wealth: {fraction_people:.3f}")
# If you have a specific population
population = 8000000000
number_people = people_for_wealth_share(wealth_share, alpha, population)
print(f"Number of people holding {wealth_share*100:.0f}% of wealth: {number_people:.1f}")
Fraction of people holding 80% of wealth: 0.512 Number of people holding 80% of wealth: 4096000000.0