About

Hi, I am Georgios Douzas. I am a machine learning researcher at Nova IMS, University of Lisbon, and a member of the MagIC research and development center. My research areas are physics, mathematics and artificial intelligence, with multiple publications in machine learning and high-energy physics journals. My professional experience includes working as a software and machine learning engineer for various companies. Additionally, I often maintain or contribute to open-source projects.

Education

Ph.D. in Theoretical Particle Physics
National Technical University of Athens
\(09/03\) - \(09/08\)

M.Sc. in Physics
National Technical University of Athens
\(09/01\) - \(09/03\)

B.Sc. in Physics
National and Kapodistrian University of Athens
\(09/97\) - \(09/01\)

Experience

Machine Learning Engineer
Trasys, Greece
\(8/20\) - Present

Designed and implemented Parallel Distribution, a software tool for the European Medicines Agency that applies OCR on PDF documents and generates a comparison report. The primary language of implementation was Python, while various text mining and machine learning libraries were used. The frontend of the tool used HTML, CSS, and JavaScript to provide an interactive dashboard of the results.

Machine Learning Researcher
University of Lisbon Nova IMS, Portugal
\(09/13\) - \(09/14\) & \(09/18\) - Present

Designed, implemented, and tested various new approaches for the class imbalance problem. Research focused on clustering-based over-sampling methods that deal with the within-the-class imbalance problem. Additionally, Geometric SMOTE, an extension of the SMOTE algorithm, was proposed and implemented. The final publication presented results showing a significant improvement over SMOTE and its variations. Deep learning models, particularly Conditional Generative Adversarial Networks (CGANs), were also used as over-sampling methods with great success. The frameworks of the implementation were TensorFlow, Keras, and PyTorch. Work is published in high-impact machine learning journals. Implementation of the above algorithms was developed and made available as open-source software. Work in progress includes comparative experiments between variations of CGANs as over-samplers and the investigation of novel algorithms in the context of reinforcement learning.

Machine Learning Engineer
Tripsta, Greece
\(10/17\) - \(08/18\)

Designed and implemented the main parts of the company’s automated pricing system. These parts included machine learning estimators for the add-ons and the competitor’s prices and the application of metaheuristic algorithms for the budget multi-objective optimization problem. The training data of the various estimators were at the order of TB while the prediction time of the automated pricing system was required to be less than \(100\) msec for the incoming \(50\)K requests/sec. The implementation languages were Python, Java, and Scala, while Spark, Dask, Scikit-Learn and jMetal were used as distributed data processing, machine learning, and optimization frameworks/libraries.

Data Scientist
Quantum Retail, Remote \(12/16\) - \(09/17\)

Worked on demand forecasting and clustering for retail companies. Proposed and applied machine learning methods to improve the company’s main forecasting solution based on exponential smoothing of the time series data and adjustments guided by a seasonality curve. Boosting trees were selected as the final machine-learning model. Applied feature extraction that integrated the business logic and extensive model hyperparameter tuning, the forecasting precision was improved by \(30\)% compared to the original model.

Machine Learning Engineer
CERN, Remote
\(05/16\) - \(09/16\)

Developed the parallelization of various features for TMVA, the Toolkit for Multivariate Data Analysis with ROOT, as a part of a project funded by Google. ROOT is the main framework developed by CERN to deal with the big data processing, statistical analysis, visualization, and storage of massive amounts of data produced from particle physics experiments. The legacy version was implemented in C++. The parallelized features included the application of brute-force and metaheuristic algorithms to the hyperparameter grid search of machine learning algorithms. The implementation was based on Python and Spark.

Scientific Software Engineer IRI, Greece
\(01/14\) - \(05/16\)

Member of the IRI’s “Solutions and Innovation Team” (R&D) working on the company’s transition towards Open Source and Elastic Computing. Participated in an agile team migrating IRI’s leading US “Price & Promo Analytics” Solution, generating more than \(\$25\)M Annual Revenues, to Hadoop distributed storage and Spark cluster computing. Python was the core language of the implementation, but integration with R and Julia was performed to leverage unique functionality. The legacy version was implemented in SAS. The project’s main objectives were the design of the parallelization schema, the enhancement of data manipulation with the use of distributed processing, and the migration of the statistical modeling algorithms (regression mixed models). The final system processed \(5\) years of data for more than \(300\) categories containing \(1\) million products.