Population and Sample

  • The population is the collection of specified groups of similar objects based on some common parameter whereas the group obtained upon applying successful criteria or parameter over a population is known as sample.

Illustration

  • Suman wants to buy a mobile and so he searches the various mobiles and shortlists a few based on various criteria among them which fulfills his requirements, then he compares the price and features of the few selected ones to buy the most appropriate.
  • Let us see Suman’s Problem:
    • Step 1: Decision to buy a mobile
    • Step 2: Choose the correct Operating System (Android vs iOS)
    • Step 3: Select the appropriate Brand (Apple, Samsung, Sony, etc.)
    • Step 4: Select Appropriate Specifications
    • Step 5: Compare all the selections up to Step 4 based on Price
    • Step 6: Buy that One!

  • The above example shows the process in analysis and conclusion in case of a real or tangible item. In the world of data, the count is in Millions and Billions so the processing and analysis at that level are entirely different.
  • The population is the collection of specified groups of similar objects based on some common parameters. In the above example, a group of 2000 mobile is a population-based on Operating System.
  • The members of the group are known as Elements of the Population e.g. Mobiles are the elements of the population in the above example.

Types of Population

  • Finite Population: This kind of population has a countable number of elements. E.g. All flats in an apartment.
  • Infinite Population: This kind of population has an infinite number of elements e.g. No water drops in a glass of water.
  • Real Population: This kind of population has elements that exist e.g. all the animals of a forest
  • Hypothetical Population: This kind of population doesn’t exist. E.g. Total Population of World in 2040.

Problems Associated with Population

  • A population usually denotes a big set containing a large number of elements, doing any kind of analysis on a very large dataset is time-consuming and has an impact on cost too. Hence, a middle way is needed wherein without doing analysis on the complete population, the characteristics of the whole population can be determined.

Example 1: 

  • Why did Suman create so many filters? (OS, Brand, Specifications, Etc.)
    • Solution: Doing all analyses on complete data of Step 1 i.e. 5000 handsets is very difficult. Like checking every specification of 10 handsets is possible but for 5000 handsets is impossible

Example 2: 

  • Why did Suman compare the price of the last selection only?
    • Solution: For Suman, the cost had not been one of the major factors hence after fulfilling the other important requirements, he would have analyzed the prices as analyzing Quality vs Price of each handset is impossible.

Sample

  • It is very difficult and time-consuming to do any kind of analysis on a population e.g. checking features of all handset models available globally.
  • However, upon filtering as per the requirements the size of available options keeps on becoming less and it becomes easy to do analysis. Hence, we can say that analysis is easy on a small group of a population rather than the total population.
  • The group obtained upon applying successful criteria or parameters over a population is known as Sample. In other words, it is a small piece taken from the large Population for the purpose of analysis. 
  • In the above example, the handsets filtered at each stage are a Sample of the Original Population i.e. the previous Step.
  • The sample is a part of the population selected based on some parameters/criteria to perform certain analyses. In other words, a sample is a subset of a population.

Important Points Regarding Sample

  • The sample shows all the characteristics of a population as well, hence any kind of analysis on a sample can be assumed to be an analysis of the whole population.
  • The number of elements in a sample is known as sample size.
  • Sampling Unit: It is the single selection selected to analyze the characteristics of the whole sample. E.g., a student representing the whole school in a survey.
  • Sampling Frame: It is a list of all Sampling Units. E.g. list of all students representing their schools for a survey.
  • Sample Space: It is a list of all possible outputs expected from an analysis. E.g. 1-00 in case of a survey to determine the age of a person who travels by train.
Join 40,000+ readers and get free notes in your email