Portfolio Project 3
Learning Goal 2 Worksheet
Dataset 1:
Variable Name: Population of UK towns and cities in 2011
Number of records: 7,724
Orders of Magnitude: NaN
Frequency of each leading digit:
1 - 25.36%
2 - 15.47%%
3 - 14.09%%
4 - 9.98%
5 - 9.48%%
6 - 8.75%
7 - 6.56%
8 - 5.66%
9 - 4.65%
State your Hypothesis:
Ho: The distribution of the leading digits of this dataset follows Benford’s Law.
Ha: The distribution of the leading digits of this dataset does not follow Benford’s Law.
What is your p-value?
P-value = 0.00
What is your conclusion based on your p-value?
We reject the null hypothesis, this data does not follow Benford's Law.
Dataset 2:
Variable Name: Population of Spanish Cities
Number of records: 8114
Orders of Magnitude: 6
Frequency of each leading digit:
1 - 31.07%
2 - 18.02%
3 - 12.42%
4 - 9.18%
5 - 7.95%
6 - 6.57%
7 - 5.36%
8 - 4.95%
9 - 4.47
State your Hypotheses:
Ho: The distribution of the leading digits of this dataset follows Benford’s Law.
Ha: The distribution of the leading digits of this dataset does not follow Benford’s Law.
What is your p-value?
P-value = 0.9985
What is your conclusion based on your p-value?
We fail to reject the null hypothesis, this data does follow Benfords Law.
Dataset 3:
Variable Name: Brazilian Addresses Street Numbers
Number of records: 538435
Orders of Magnitude: 5
Frequency of each leading digit:
1 - 29.55%
2 - 17.61%
3 - 12.77%
4 - 9.73%
5 - 8.21%
6 - 6.71%
7 - 5.69%
8 - 5.12%
9 - 4.56%
State your Hypothesis:
Ho: The distribution of the leading digits of this dataset follows Benford’s Law.
Ha: The distribution of the leading digits of this dataset does not follow Benford’s Law.
What is your p-value?
P-value = 0.00
What is your conclusion based on your p-value?
We fail to reject the null hypothesis. This data does not support Benfords Law.
Summarize:
In 4-5 sentences, why is it surprising that Benford’s Law held (or nearly held) across your three datasets?
It is surprising that Benford’s Law nearly held for one of my datasets because it is such a specific distribution that occurs across natural data that is often quite large. To me, this law feels random considering real world data is often thought of as uniformly distributed. I always imagine natural data to follow linear or exponential patterns, making Benford’s Law all the more surprising to observe. This is especially interesting considering Benford’s Law holds within datasets that come from completely different backgrounds and contexts.
Learning Goal 4 - The Data
a. I chose to analyze the merchandise imports in Argentina between the years 1960 and 2023.
b. In order to clean the data to be useable for the specific purposes of this project, I had to transpose the list of years as well as the merchandise imports in dollar amounts specifically from Argentina.Then, after generating the average and standard deviation, I was able to come up with a suitable indicator to use for hypothesis testing.
c. Above 10 billion dollars spent on merchandise importing is an indicator that Argentina is not producing enough internal merchandise and has become overly reliant on outside resources.
Learning Goal 4 - Null and Alternative Hypothesis
A. Ho: Amount in US dollars less than 10 billion.
Ha: Amount in US dollars is greater than 10 billion.
B. I chose this threshold because it is the first significant jump in import value, reflecting a major change in Argentina’s economy and an increase in foreign dependency.
C. My significance level is 0.05
D. P-value: 0.823
I used a right tail z test and discovered that, at the 0.05 significance level , we would fail to reject the null hypothesis, meaning that the dollar amount Argentina spends on imported merchandise is less than 10 billion.
Learning Goal 5
a) I chose to analyze female life expectancy at birth in both Belgium and Botswana, two countries in two different continents. I chose the data within the years 1960 and 2022.
b) To make this data suitable for hypothesis testing, I had to transpose Belgium and Botswana's data on female life expectancy onto a new sheet. I calculated the mean, standard deviation, and variance which I then used to perform. the 2-sample z test.
c) This indicator is important and interesting because female life expectancy is often used as a measure of gender-specific health outcomes. We can use this data to better understand the impact of the Belgian and Botswanian healthcare system and cultural impacts specifically on women.
Learning Goal 5 - 2 Sample Z Test
a) Ho: There is no significant difference between female life expectancy in Belgium and Botswana.
Ha: There is a significant difference between female life expectancy in Belgium and Botswana.
b) I tested at the 0.05 significance level
c) The concluding P-value was 0.00 Based off the 2 sample Z test, there is evidence which suggests that there is a significant difference between female life expectancy in Belgium and Botswana at the 0.05 significance level.
Learning Goal 6 - Pfizer
a) Pfizer is a global biopharmaceutical company which specializes in research, development, and manufacturing of medicines and vaccines for the medical industry. It is relate to both merchandise imports and female life expectancy at birth in that it uses imported merchandise to produce products and research that directly help women facing age-related health issues.
b) Based on my earlier hypothesis testing, Argentina spends less than 10 billion dollars annually on imported merchandise. This might affect Pfizer because it implies that Argentina might have lower demand for the products that Pfizer produces, potentially affecting Pfizer's market opportunities in the region.
Relative to female life expectancy, based on the hypothesis testing it was found that Botswana and Belgium significantly differ in female life expectancy at birth. This means that Pfizer would distribute its age-related health products differently to Botswana than to Belgium. Botswana has a lower female life expectancy rate than Belgium meaning that the country has a higher need for the products and ressearcg that Pfizer develops.
Learning Goal 6 - Business Proposal
To address research done in Argentina, Botswana, and Belgium, I have created a three-pronged business proposal that implements the results of the research to improve Pfizer’s effectiveness and strengthen its global presence.
1. In Argentina, Pfizer can establish local research and development centers to strengthen their presence in South America and foster local production in Argentina, increasing the availability of medications and becoming more cost-efficient to the local population.
2. In Belgium, Pfizer can better assert its presence by partnering with local healthcare administrations. Establishing these kinds of partnerships would allow Pfizer to build connections in Europe and get a better sense of what it can develop to address specific European needs.
3. In Botswana, Pfizer can build a branch of its medical research centers in Botswana to target the health conditions that disproportionately affect women. Opening this branch would allow Pfizer to have a significant impact on the Botswanian community by providing essential medications and preventative care solutions.
Learning Goal 6 - If results were different…
If my hypothesis tests gave me different results that indicated the opposite findings from my original ones, I would redirect resources to be more focused on Argentina and Belgium. Since opposite results would indicate a larger dependence on imported goods in Argentina, there would be less need to set up centers in Argentina to accommodate a smaller reliance. Instead, in regards to Argentina, Pfizer would focus on maintaining and strengthening ties with Argentine healthcare providers without actually investing money into directly setting up bases in South America.
Relative to expansions within Belgium and Botswana, if the roles were reversed and Belgium had lower female life expectancy at birth, I would allocate more resources to Belgium and focus on building a center local to the Belgian community. In Botswana, I would dedicate more resources to initiatives already established in the community.
Learning Goal 6 - Other data to collect
To create a more specific business proposal, I would collect this additional data:
● Healthcare spending patterns: Pfizer would invest in tracking data related to healthcare spending patterns to better understand how big of an industry healthcare is in the specified regions.
● Average Income: I would find a confidence interval for the average income in these regions so that I could understand the range within which the true population mean likely falls.
● Healthcare spending per capita: I would calculate the mean from a sample of data along with measures of variance to gain an accurate understanding of the data’s spread.