﻿Template-type: ReDIF-Paper 1.0
Author-Name: Vicente Nuñez-Antón
Author-Workplace-Name: Department of Applied Economics III (Econometrics and Statistics), Faculty of Economics and Business, University of the Basque Country 
	UPV/EHU, Bilbao. (Spain).
Author-Name: Juan Manuel Pérez-Salamero González
Author-Workplace-Name: Department of Financial Economics and Actuarial Science, Faculty of Economics, University of Valencia, Valencia. (Spain). 
Author-Name: Marta Regúlez-Castillo
Author-Workplace-Name: Department of Applied Economics III (Econometrics and Statistics), Faculty of Economics and Business, University of the Basque Country 
	UPV/EHU, Bilbao. (Spain).
Author-Name: Carlos Vidal-Meliá
Author-Workplace-Name: Department of Financial Economics and Actuarial Science, Faculty of Economics, University of Valencia, Valencia (Spain) and research 
	affiliation with the Instituto Complutense de Análisis Económico (ICAE), Complutense University of Madrid (Spain) and the Centre of Excellence in 
	Population Ageing Research (CEPAR), UNSW (Australia).
Title: Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives
Abstract: This paper develops an optimization model for selecting a large subsample that improves the representativeness of a simple random sample previously 
	obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex 
	MINLP) and is therefore NP-hard. However, the solution is found by maximizing the “constant of proportionality” – in other words, maximizing the size 
	of the subsample taken from a stratified random sample with proportional allocation – and restricting it to a p-value high enough to achieve a good 
	fit to the population of interest using Pearson’s chi-square goodness-of-fit test. The beauty of the model is that it gives the user the freedom to 
	choose between a larger subsample with a poorer fit and a smaller subsample with a better fit. The paper also applies the model to a real case: The 
	Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security 
	records. Several waves (2005-2017) are first examined without using the model and the conclusion is that they are not representative of the target 
	population, which in this case is people receiving a pension income. The model is then applied and the results prove that it is possible to obtain a 
	large dataset from the CSWL that (far) better represents the pensioner population for each of the waves analysed.
Classification-JEL: C61, C81, C12, H55, J26.
Keywords: Optimization; Subsampling; Chi-square test; P-value, Continuous Sample of Working Lives.
Length: 30 pages 
Creation-Date: 2019-03
Number: 2019-20
X-File-Ref: http://america.sim.ucm.es/repec/ucm/ref/doicae1920.txt
File-URL: https://eprints.ucm.es/id/eprint/55423/1/1920.pdf
File-Format: Application/pdf
Handle: RePEc:ucm:doicae:1920