Hypothesis testing, associated with the scientific method provides a basis of acceptability of ideas or theories that are presented as either true or false. To be precise, hypothesis testing helps us understand the likelihood of a theory to be true or false in terms of degrees of confidence, we can be 95% confident that a theory X is true or 99% confident but it is never possible to be 100% confident. That is, you can never be “certain”. If someone has a theory or ideas about economy or investing or markets, you can use hypothesis testing to verify if it can be true or not.
The field of statistical inference, where conclusions on a population are drawn from observing subsets of the larger group, is generally divided into two groups: estimation and hypothesis testing. With estimation, the focus is on answering the value of a parameter, with a degree of confidence. Whereas in Hypothesis testing, the focus is on verifying if the reported value of a parameter is true or not (again, with a degree of confidence).
What is a Hypothesis?
A hypothesis is a statement made about a population parameter. These are typical hypotheses: "the mean annual return of this mutual fund is greater than
12%", and "the mean return is greater than the average return for the category".
Hypothesis testing seeks to answer seven questions:
1. What is the null hypothesis and the alternative hypothesis?
2. Which test statistic is appropriate, and what is the probability distribution?
3. What is the required level of significance?
4. What is the decision rule?
5. Based on the sample data, what is the value of the test statistic?
6. Do we reject or fail to reject the null hypothesis?
7. Based on our rejection or inability to reject, what is our investment or economic decision?
Alternate Hypothesis (denoted by H_{a}) is our original hypothesis and Null Hypothesis (denoted by H_{0}) is the negation of the Alternate Hypothesis. If we look at the hypothesis, “the mean annual return of this mutual fund is greater than 12%”
H_{a}: The mean annual return of this mutual fund is greater than 12%_{ }
H_{0}: The mean annual return of this mutual fund is less than or equal to 12%
We work on the null hypothesis, if it falls within our confidence interval, we fail to reject the Null Hypothesis (which is what we would hope for). If it doesn’t, we don’t “accept” the null hypothesis, we simply fail to reject null hypothesis. This terminology comes from our objective, we are trying to reject the null hypothesis. So we either succeed in rejecting it or fail to reject it.
OneTailed Test
The “onetailed” and “twotailed” tests refer to the standard normal distribution. The keywords for identifying a onetailed test are “greater than or less
than”. For our example hypothesis we use a one tailed test that will be rejected based only on observations in the left tail.
TwoTailed test
Characterized by the words "equal to or not equal to". For example, if our hypothesis were that the return on a mutual fund is equal to 8%, we could reject it based on observations in either tail (sufficiently higher than 8% or sufficiently lower than 8%).

Choosing the null and the alternate hypothesis:
If θ (theta) is the actual value of a population parameter (e.g. mean or standard deviation), and θ_{0} (theta subzero) is the value of theta
according to our hypothesis, the null and alternative hypothesis can be formed in three different ways:

Choosing what will be the null and what will be the alternative depends on the case and what it is we wish to prove. We usually have two different approaches to what we could make the null and alternative, but in most cases, it's preferable to make the null what we believe we can reject, and then attempt to reject it. For example, in our case of a onetailed test with the return hypothesized to be greater than 8%, we could make the greaterthan case the null (alternative being less than), or we could make the greaterthan case the alternative (with less than the null). Which should we choose? A hypothesis test is typically designed to look for evidence that may possibly reject the null. So in this case, we would make the null hypothesis "the return is less than or equal to 8%", which means we are looking for observations in the left tail. If we reject the null, then the alternative is true, and we conclude the fund is likely to return at least 8%.
In hypothesis testing, a test statistic is defined as a quantity taken from a sample that is used as the basis for testing the null hypothesis (rejecting or failing to reject the null).
Calculating a test statistic will vary based upon the case and our choice of probability distribution (for example, ttest, zvalue). The general format of the calculation is:
Formula 2.36

Type I and Type II Errors
The significance level is similar in concept to the confidence level associated with estimating a parameter  both involve choosing the probability of
making an error (denoted by α, or alpha), with lower alphas reducing the percentage probability of error. In the case of estimators, the tradeoff of
reducing this error was to accept a wider (less precise) confidence interval. In the case of hypothesis testing, choosing lower alphas also involves a
tradeoff  in this case, increasing a second type of error.
Errors in hypothesis testing come in two forms: Type I and Type II. A type I error is defined as rejecting the null hypothesis when it is true. A type II
error is defined as not rejecting the null hypothesis when it is false. As the table below indicates, these errors represent two of the four possible
outcomes of a hypothesis test:

The reason for separating type I and type II errors is that, depending on the case, there can be serious consequences for a type I error, and there are other cases when type II errors need to be avoided, and it is important to understand which type is more important to avoid.
Denoted by α, or alpha, the significance level is the probability of making a type I error, or the probability that we will reject the null hypothesis when
it is true. So if we choose a significance level of 0.05, it means there is a 5% chance of making a type I error. A 0.01 significance level means there is
just a 1% chance of making a type I error. As a rule, a significance level is specified prior to calculating the test statistic, as the analyst conducting
the research may use the result of the test statistic calculation to impact the choice of significance level (may prompt a change to higher or lower
significance). Such a change would take away from the objectivity of the test.
While any level of alpha is permissible, in practice there is likely to be one of three possibilities for significance level: 0.10 (semistrong evidence
for rejecting the null hypothesis), 0.05 (strong evidence), and 0.01 (very strong evidence). Why wouldn't we always opt for 0.01 or even lower
probabilities of type I errors  isn't the idea to reduce and eliminate errors? In hypothesis testing, we have to control two types of errors, with a
tradeoff that when one type is reduced, the other type is increased. In other words, by lowering the chances of a type I error, we must reject the null
less frequently  including when it is false (a type II error). Actually quantifying this tradeoff is impossible because the probability of a type II error
(denoted by β, or beta) is not easy to define (i.e. it changes for each value of θ). Only by increasing sample size can we reduce the probability of both
types of errors.
This rule is crafted by comparing two values: (1) the result of the calculated value of the test statistic, which we will complete in step #5 and (2) a
rejection point, or critical value (or values) that is (are) the function of our significance level and the probability distribution being used in the
test. If the calculated value of the test statistic is as extreme (or more extreme) than the rejection point, then we reject the null hypothesis, and state
that the result is statistically significant. Otherwise, if the test statistic does not reach the rejection point, then we cannot reject the null
hypothesis and we state that the result is not statistically significant. A rejection point depends on the probability distribution, on the chosen alpha,
and on whether the test in onetailed or twotailed.
For example, if in our case we are able to use the standard normal distribution (the zvalue), if we choose an alpha of 0.05, and we have a twotailed test
(i.e. reject the null hypothesis when the test statistic is either above or below), the two rejection points are taken from the zvalues for standard
normal distributions: below 1.96 and above +1.96. Thus if the calculated test statistic is in these two rejection ranges, the decision would be to reject
the null hypothesis. Otherwise, we fail to reject the null hypothesis.
The power of a hypothesis test refers to the probability of correctly rejecting the null hypothesis. There are two possible outcomes when the null hypothesis is false: either we (1) reject it (as we correctly should) or (2) we accept it  and make a type II error. Thus the power of a test is also equivalent to 1 minus the beta (β), the probability of a type II error. Since beta isn't quantified, neither is the power of a test. For hypothesis tests, it is sufficient to specify significance level, or alpha. However, given a choice between more than one test statistic (for example, ztest, ttest), we will always choose the test that increases a test's power, all other factors equal.
Confidence intervals, as a basis for estimating population parameters, were constructed as a function of "number of standard deviations away from the mean". For example, for 95% confidence that our interval will include the population mean (μ), when we use the standard normal distribution (zstatistic), the interval is: (sample mean) ± 1.96 * (standard error), or, equivalently,1.96*(standard error) < (sample mean) < +1.96*(standard error).
, as a basis for testing the value of population parameters, are also set up to reject or not reject based on "number of standard deviations away from the mean". The basic structure for testing the null hypothesis at the 5% significance level, again using the standard normal, is 1.96 < [(sample mean  hypothesized population mean) / standard error] < +1.96, or, equivalently,1.96 * (std. error) < (sample mean)  (hypo. pop. mean) < +1.96 * (std. error).
In hypothesis testing, we essentially create an interval within which the null will not be rejected, and we are 95% confident in this interval (i.e. there's a 5% chance of a type I error). By slightly rearranging terms, the structure for a confidence interval and the structure for rejecting/not rejecting a null hypothesis appear very similar  an indication of the relationship between the concepts.
Hypothesis testing involves making the statistical decision, which actually compares the test statistic to the value computed as the rejection point; that
is, it carries out the decision rule created in step #4. For example, with a significance level of 0.05, using the standard normal distribution, on a
twotailed test (i.e. null is "equal to"; alternative is not equal to), we have rejection points below 1.96 and above +1.96. If our calculated test
statistic
[(sample mean  hypothesized mean) / standard error] = 0.6, then we cannot reject the null hypothesis. If the calculated value is 3.6, we reject the null
hypothesis and accept the alternative.
The final step, or step #7, involves making the investment or economic decision (i.e. the realworld decision). In this context, the statistical
decision is but one of many considerations. For example, take a case where we created a hypothesis test to determine whether a mutual fund outperformed its
peers in a statistically significant manner. For this test, the null hypothesis was that the fund's mean annual return was less than or equal to a category
average; the alternative was that it was greater than the average. Assume that at a significance level of 0.05, we were able to establish statistical
significance and reject the null hypothesis, thus accepting the alternative. In other words, our statistical decision was that this fund would outperform
peers, but what is the investment decision? The investment decision would likely take into account (for example) the risk tolerance of the client and the
volatility (risk) measures of the fund, and it would assess whether transaction costs and tax implications make the investment decision worth making. In
other words, rejecting/not rejecting a null hypothesis does not automatically require that a decision be carried out; thus there is the need to assess the
statistical decision and the economic or investment decision in two separate steps