SlideShare a Scribd company logo
1 of 3
Download to read offline
Data Scientist Tutorial
Big Data Analytics-Data Scientist
The role of a data scientist is normally associated with tasks such as predictive
modeling, developing segmentation algorithms, recommender systems, A/B testing
frameworks and often working with raw unstructured data.
The nature of their work demands a deep understanding of mathematics, applied
statistics and programming. There are a few skills common between a data analyst
and a data scientist, for example, the ability to query databases. Both analyze
data, but the decision of a data scientist can have a greater impact in an
organization.
Here is a set of skills a data scientist normally need to have −
● Programming in a statistical package such as: R, Python, SAS, SPSS, or Julia
● Able to clean, extract, and explore data from different sources
● Research, design, and implementation of statistical models
● Deep statistical, mathematical, and computer science knowledge
In big data analytics, people normally confuse the role of a data scientist with that
of a data architect. In reality, the difference is quite simple. A data architect
defines the tools and the architecture the data would be stored at, whereas a data
scientist uses this architecture. Of course, a data scientist should be able to set up
new tools if needed for ad-hoc projects, but the infrastructure definition and design
should not be a part of his task.
Here are a few good recommendations available currently:
● Mac Book Pro – 15 inches model.
● I had purchased a Lenovo Z510 model about 3 years back – i7 (3632QM)
chip, 16 GB RAM with NVIDIA GPU and it has served me well. It is still one of
the better machines in the market (in terms of performance).
● If you are based in the US and want something out of the world, you can
check out Malibal 9000 – it’s a beauty, if you can live with a bit of extra
weight.
Here is the Tutorial for Data Science, you can go through it.
A few additional notes:
● Skylake chips (6th generation) from Intel were announced recently and
machine based on them are just round the corner. I believe that they will
push the envelope once again. You can check out Lenovo Thinkpad P50 &
P70 configuration as an evidence. So, even if you have a moderate machine
today, I would recommend you to stay put for another 2 – 3 months and
then buy 6th generation quad core chip based machine.
● If you have to buy a machine today, it might be a good idea to stick with 4th
generation quad core i7 chip. There weren’t many options available with 5th
generation chip-set available at time of writing this article.
People might argue that you don’t need to invest in such an advanced machine.
You might be better off working with a mediocre machine over the cloud. I
personally like accessibility provided by a personal machine and the fact that I can
start working at any place without hooking on to the internet.
Hardware – choice of your machine
The first thing to ensure is that you are on the right hardware for data science.
There is not much any one can do, if your hardware does not have what you would
need. Since laptops are the mainstream device for computing now a days, my
recommendations below are for laptop. If you use a desktop / iMac, you can go
with even better configuration.
While this choice will ultimately boil down to how much you can shell out for a
machine, I would recommend a machine with quad-core processor, preferably i7
(in case of Intel chips). Make sure you check that the processor you choose if quad
core and not dual core. Lately, it has been really difficult to find good quad core
chips. You can check benchmark performance of various chips in your budget
against each other using sites like cpuboss.
Next, it is always recommended to maximize your RAM to the extent possible. A lot
of tools use RAM for computations and you don’t want to run out of RAM while
doing them (you eventually will in some cases!).
If your budget allows, you should upgrade to SSD as your read / write operations
with datasets will take a fraction of time compared to normal SATA hard disk. For
those, who are really serious about learning machine learning and deep learning, it
is recommended to have a NVIDIA GPU, so that you can run intense computations
using CUDA.

More Related Content

Similar to Data Scientist Hardware and Skills

Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer BuildPetteriTeikariPhD
 
Big Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIneBig Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIneDouglas Moore
 
FPGAs versus GPUs in Data centers
FPGAs versus GPUs in Data centersFPGAs versus GPUs in Data centers
FPGAs versus GPUs in Data centersMehedi Hasan Raju
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsArnon Shimoni
 
How To Set up Home server posted by wired.com
How To Set up Home server posted by wired.comHow To Set up Home server posted by wired.com
How To Set up Home server posted by wired.comSHUBHAM YADAV
 
Transforming Business with Intel and SAP HANA 2
Transforming Business with Intel and SAP HANA 2 Transforming Business with Intel and SAP HANA 2
Transforming Business with Intel and SAP HANA 2 PT Datacomm Diangraha
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Ahmed Kamal
 
Enterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NETEnterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NETAnant Corporation
 
Build next generation apps with eyes and ears using Google Chrome
Build next generation apps with eyes and ears using Google ChromeBuild next generation apps with eyes and ears using Google Chrome
Build next generation apps with eyes and ears using Google ChromeAhmedabadJavaMeetup
 
Pros_and_Cons_of_DW_Apps pdf.pdf
Pros_and_Cons_of_DW_Apps pdf.pdfPros_and_Cons_of_DW_Apps pdf.pdf
Pros_and_Cons_of_DW_Apps pdf.pdfHernanKlint
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dan Lynn
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Deep learning at supercomputing scale  by Rangan Sukumar from CrayDeep learning at supercomputing scale  by Rangan Sukumar from Cray
Deep learning at supercomputing scale by Rangan Sukumar from CrayBill Liu
 
Best Laptops Under $600
Best Laptops Under $600Best Laptops Under $600
Best Laptops Under $600Wils_Anthony
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AILex Yu
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI TodayDESMOND YUEN
 
Making the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedMaking the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedInside Analysis
 

Similar to Data Scientist Hardware and Skills (20)

Deep Learning Computer Build
Deep Learning Computer BuildDeep Learning Computer Build
Deep Learning Computer Build
 
Big Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIneBig Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIne
 
FPGAs versus GPUs in Data centers
FPGAs versus GPUs in Data centersFPGAs versus GPUs in Data centers
FPGAs versus GPUs in Data centers
 
Nt1310 Unit 3 Pc
Nt1310 Unit 3 PcNt1310 Unit 3 Pc
Nt1310 Unit 3 Pc
 
GPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holdsGPU databases - How to use them and what the future holds
GPU databases - How to use them and what the future holds
 
How To Set up Home server posted by wired.com
How To Set up Home server posted by wired.comHow To Set up Home server posted by wired.com
How To Set up Home server posted by wired.com
 
Transforming Business with Intel and SAP HANA 2
Transforming Business with Intel and SAP HANA 2 Transforming Business with Intel and SAP HANA 2
Transforming Business with Intel and SAP HANA 2
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
 
Enterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NETEnterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NET
 
Build next generation apps with eyes and ears using Google Chrome
Build next generation apps with eyes and ears using Google ChromeBuild next generation apps with eyes and ears using Google Chrome
Build next generation apps with eyes and ears using Google Chrome
 
Pros_and_Cons_of_DW_Apps pdf.pdf
Pros_and_Cons_of_DW_Apps pdf.pdfPros_and_Cons_of_DW_Apps pdf.pdf
Pros_and_Cons_of_DW_Apps pdf.pdf
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Deep learning at supercomputing scale by Rangan Sukumar from Cray
Deep learning at supercomputing scale  by Rangan Sukumar from CrayDeep learning at supercomputing scale  by Rangan Sukumar from Cray
Deep learning at supercomputing scale by Rangan Sukumar from Cray
 
Best Laptops Under $600
Best Laptops Under $600Best Laptops Under $600
Best Laptops Under $600
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
 
Making the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than SpeedMaking the Most of In-Memory: More than Speed
Making the Most of In-Memory: More than Speed
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Data Scientist Hardware and Skills

  • 1. Data Scientist Tutorial Big Data Analytics-Data Scientist The role of a data scientist is normally associated with tasks such as predictive modeling, developing segmentation algorithms, recommender systems, A/B testing frameworks and often working with raw unstructured data. The nature of their work demands a deep understanding of mathematics, applied statistics and programming. There are a few skills common between a data analyst and a data scientist, for example, the ability to query databases. Both analyze data, but the decision of a data scientist can have a greater impact in an organization. Here is a set of skills a data scientist normally need to have − ● Programming in a statistical package such as: R, Python, SAS, SPSS, or Julia ● Able to clean, extract, and explore data from different sources ● Research, design, and implementation of statistical models ● Deep statistical, mathematical, and computer science knowledge In big data analytics, people normally confuse the role of a data scientist with that of a data architect. In reality, the difference is quite simple. A data architect defines the tools and the architecture the data would be stored at, whereas a data scientist uses this architecture. Of course, a data scientist should be able to set up new tools if needed for ad-hoc projects, but the infrastructure definition and design should not be a part of his task. Here are a few good recommendations available currently: ● Mac Book Pro – 15 inches model. ● I had purchased a Lenovo Z510 model about 3 years back – i7 (3632QM) chip, 16 GB RAM with NVIDIA GPU and it has served me well. It is still one of the better machines in the market (in terms of performance).
  • 2. ● If you are based in the US and want something out of the world, you can check out Malibal 9000 – it’s a beauty, if you can live with a bit of extra weight. Here is the Tutorial for Data Science, you can go through it. A few additional notes: ● Skylake chips (6th generation) from Intel were announced recently and machine based on them are just round the corner. I believe that they will push the envelope once again. You can check out Lenovo Thinkpad P50 & P70 configuration as an evidence. So, even if you have a moderate machine today, I would recommend you to stay put for another 2 – 3 months and then buy 6th generation quad core chip based machine. ● If you have to buy a machine today, it might be a good idea to stick with 4th generation quad core i7 chip. There weren’t many options available with 5th generation chip-set available at time of writing this article. People might argue that you don’t need to invest in such an advanced machine. You might be better off working with a mediocre machine over the cloud. I personally like accessibility provided by a personal machine and the fact that I can start working at any place without hooking on to the internet. Hardware – choice of your machine The first thing to ensure is that you are on the right hardware for data science. There is not much any one can do, if your hardware does not have what you would need. Since laptops are the mainstream device for computing now a days, my recommendations below are for laptop. If you use a desktop / iMac, you can go with even better configuration. While this choice will ultimately boil down to how much you can shell out for a machine, I would recommend a machine with quad-core processor, preferably i7 (in case of Intel chips). Make sure you check that the processor you choose if quad
  • 3. core and not dual core. Lately, it has been really difficult to find good quad core chips. You can check benchmark performance of various chips in your budget against each other using sites like cpuboss. Next, it is always recommended to maximize your RAM to the extent possible. A lot of tools use RAM for computations and you don’t want to run out of RAM while doing them (you eventually will in some cases!). If your budget allows, you should upgrade to SSD as your read / write operations with datasets will take a fraction of time compared to normal SATA hard disk. For those, who are really serious about learning machine learning and deep learning, it is recommended to have a NVIDIA GPU, so that you can run intense computations using CUDA.