This work addresses one of the privacy concerns regarding testing social media applications, where testing applications such as recommender systems requires input data from real users. The aim of this paper is to show how probability distributions, in particular the Weibull model, can be used to generate large collections of artificial unstructured data to simulate real data for testing recommender systems. In this work we analyzed data from two social media websites: user tweets on Twitter and comments made in discussion forums on Reddit, both of which contain several thousand data instances.