Data Science

Back to page

Methods of generating values from the data:

Predictive modelling

We have most experience in predictive modelling. We carried out our biggest projects so far in this particular field. They involved:

  1. Customer attrition modelling (Churn) - finance industry (data volume (around several millions observations) - ongoing project (since second quarter of 2017)
  2. Default probability for Structural Credit Risk - finance industry (around several millions observations) - this project took place in the first half of 2017
  3. Bookmaker risk analysis (around several millions observations) - second half of 2017

During these projects, we applied different algorithms, such as: random forests, neural networks, logistic regression, Stochastic Gradient Boosting and Support Vector Machines, among others.


We have run clustering analysis mostly in retail trade and construction industries (two separate projects at the turn of 2017 and 2018). We looked for the optimal division on uniform groups of clients, products and vendors.

We usually used K-means and K-medoids algorithms, as well as hierarchical clustering. The analyses were carried out on data sets of up to a couple thousand observations.

Association analysis

We carried out three such projects in the retail business. The scope of analysis involved data set up to a couple thousand observations. One project took place at the start of 2017, and the other two at the turn of 2017 and 2018.

Simulation modelling

We made a model that ensured the optimization of processes and costs in face of uncertainty. It was a project for a manufacturing company, which had limited knowledge on future values of the parameters.

In order to optimize the operations, it required running estimations on the basis of multiple simulations. The data set involved several million observations. This project took place in the second half of 2017.

Text mining

An example of a text mining project involved supplying the data to unfinished sets and correcting incorrect strings on the basis of an available dictionary of possible values.

The project was carried out in the pharmaceutical industry in the second half of 2016, and the data sets involved up to several-dozen of thousands of observations.

Andrzej Gut, Big data specialist