This document provides an overview of the role and skills required of a data scientist. It discusses the types of tasks data scientists perform such as predictive modeling, segmentation algorithms, and A/B testing. Data scientists need a strong understanding of mathematics, statistics, and programming. The document also contrasts the roles of data scientists and data architects, noting that data scientists use the infrastructure defined by data architects. It provides recommendations for laptop hardware suitable for data science work, emphasizing the importance of a quad-core processor, RAM, SSD storage, and potentially a discrete GPU.
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Data Scientist Hardware and Skills
1. Data Scientist Tutorial
Big Data Analytics-Data Scientist
The role of a data scientist is normally associated with tasks such as predictive
modeling, developing segmentation algorithms, recommender systems, A/B testing
frameworks and often working with raw unstructured data.
The nature of their work demands a deep understanding of mathematics, applied
statistics and programming. There are a few skills common between a data analyst
and a data scientist, for example, the ability to query databases. Both analyze
data, but the decision of a data scientist can have a greater impact in an
organization.
Here is a set of skills a data scientist normally need to have −
● Programming in a statistical package such as: R, Python, SAS, SPSS, or Julia
● Able to clean, extract, and explore data from different sources
● Research, design, and implementation of statistical models
● Deep statistical, mathematical, and computer science knowledge
In big data analytics, people normally confuse the role of a data scientist with that
of a data architect. In reality, the difference is quite simple. A data architect
defines the tools and the architecture the data would be stored at, whereas a data
scientist uses this architecture. Of course, a data scientist should be able to set up
new tools if needed for ad-hoc projects, but the infrastructure definition and design
should not be a part of his task.
Here are a few good recommendations available currently:
● Mac Book Pro – 15 inches model.
● I had purchased a Lenovo Z510 model about 3 years back – i7 (3632QM)
chip, 16 GB RAM with NVIDIA GPU and it has served me well. It is still one of
the better machines in the market (in terms of performance).
2. ● If you are based in the US and want something out of the world, you can
check out Malibal 9000 – it’s a beauty, if you can live with a bit of extra
weight.
Here is the Tutorial for Data Science, you can go through it.
A few additional notes:
● Skylake chips (6th generation) from Intel were announced recently and
machine based on them are just round the corner. I believe that they will
push the envelope once again. You can check out Lenovo Thinkpad P50 &
P70 configuration as an evidence. So, even if you have a moderate machine
today, I would recommend you to stay put for another 2 – 3 months and
then buy 6th generation quad core chip based machine.
● If you have to buy a machine today, it might be a good idea to stick with 4th
generation quad core i7 chip. There weren’t many options available with 5th
generation chip-set available at time of writing this article.
People might argue that you don’t need to invest in such an advanced machine.
You might be better off working with a mediocre machine over the cloud. I
personally like accessibility provided by a personal machine and the fact that I can
start working at any place without hooking on to the internet.
Hardware – choice of your machine
The first thing to ensure is that you are on the right hardware for data science.
There is not much any one can do, if your hardware does not have what you would
need. Since laptops are the mainstream device for computing now a days, my
recommendations below are for laptop. If you use a desktop / iMac, you can go
with even better configuration.
While this choice will ultimately boil down to how much you can shell out for a
machine, I would recommend a machine with quad-core processor, preferably i7
(in case of Intel chips). Make sure you check that the processor you choose if quad
3. core and not dual core. Lately, it has been really difficult to find good quad core
chips. You can check benchmark performance of various chips in your budget
against each other using sites like cpuboss.
Next, it is always recommended to maximize your RAM to the extent possible. A lot
of tools use RAM for computations and you don’t want to run out of RAM while
doing them (you eventually will in some cases!).
If your budget allows, you should upgrade to SSD as your read / write operations
with datasets will take a fraction of time compared to normal SATA hard disk. For
those, who are really serious about learning machine learning and deep learning, it
is recommended to have a NVIDIA GPU, so that you can run intense computations
using CUDA.