Presentation to DBAs and SQL users about how data scientists work. Data scientists often don't know in advance what kind of data they need for building models. Therefore, it's hard to architect data bases around their (often transient) needs.
We present a workflow and a few examples to illustrate the core process of training machine learning models.
Data Science
The talk aims to give an overview of common topics data science and how they may relate to database administration.
Often the required data live on multiple database systems in structures that are different from the format that the analytics algorithm requires.
On the large scale this can add pressure on the existing DB infrastructure, and slow down the data extraction process.
Optimally, one would consider those requirement in the DB design process. However, many data science projects are for exploratory purposes and have a short lifespan.
Dr. Péter Molnár is a data scientist at RentPath, LLC and faculty at the Institute of Insight at Georgia State University.
As a academic and business professional, he is advancing and applying data science theories and tools in both public and private domains, including research in robotics, artificial intelligence, and machine learning.