Data Skills for Digital Era
The Top Data Skills You Need To Get Hired
Main Focus
Data Science Business Intelligence
Big Data Data Engineering
Mohtat@ut.ac.ir 2
Data Science
Math & Statistics
Computer Science
Subject Matter Expertise
Mohtat@ut.ac.ir 4
Data Science is an
interdisciplinary field about
processes and systems to
extract knowledge or
insights from data, which is
a continuation of some of
the data analysis fields such
as statistics, data mining,
and predictive analytics,
similar to Knowledge
Discovery in
Databases (KDD).
Types of Analytics
Descriptive
Diagnostic
Prescriptive
Predictive
Mohtat@ut.ac.ir 6
Data
Science
Technology
Application
Mohtat@ut.ac.ir 8
Critical Skills for Data Scientists
Python
R
SQL
Data Mining Tools
Knime , ReapidMiner,
IBM SPSS Modeler
Excel
BI Tools
Tableau, Power BI, Qlik
Mohtat@ut.ac.ir 9
Top Python Libraries in Data Science
TensorFlow
“TensorFlow is an open source
software library for numerical
computation using data flow graphs.
PyTorch
“PyTorch is a Python package that
provides Deep neural networks built
on a tape-based autograd system
Numpy
“NumPy is the fundamental
package needed for scientific
computing with Python.
Scikit-Learn
“scikit-learn is a Python module for
machine learning built on NumPy,
SciPy and matplotlib.
Keras
“Keras is a high-level neural networks
API, written in Python and capable of
running on top of TensorFlow, CNTK,
or Theano.
Scipy
“SciPy is open-source software for
mathematics, science, and engineering.
Pandas
“pandas is a Python package providing
fast, flexible, and expressive data
structures designed to make working
with "relational" or "labeled" data both
easy and intuitive
Matplotlib
“Matplotlib is a Python 2D plotting
library which produces publication-
quality figures in a variety of
hardcopy formats and interactive
environments across platforms.
Scrapy
“Scrapy is a fast high-level web crawling
and web scraping framework, used to
crawl websites and extract structured
data from their pages.
Mohtat@ut.ac.ir 10
Top Skills every Data Scientist needs to Master
TensorFlow Keras Hadoop Spark Hive Java Matlab
Mohtat@ut.ac.ir 11
Most Essential Skills for Data Scientists
Complex Problem Solving
Team Working
Emotional Intelligence
Creativity
Critical Thinking
Negotiation
Mohtat@ut.ac.ir 12
Applied Data Science with Python
Michigan University(Coursera)
Basic Data Visualization Machine Learning Text Mining SNA
Applied Text Mining in Python
Introduction to Data Science in Python
Applied Plotting, Charting & Data
Representation in Python
Applied Machine Learning in Python Applied Social Network Analysis in
Python
Mohtat@ut.ac.ir 13LOGO HERE
Data Science Books
14
Business Intelligence
encompasses a wide variety of
tools, applications and
methodologies that enable
organizations to collect data
from internal systems and
external sources; prepare it for
analysis; develop and run
queries against that data; and
create reports, dashboards and
data visualizations to make the
analytical results available to
corporate decision-makers, as
well as operational workers.
BI
Mohtat@ut.ac.ir 17
Business Skills
Link to Business Strategy
Define Priorities
Define BI Vision
Lead Organization / BPR
Analytics Skills
Data Mining
Social BI
IT Skills
Infrastructure
Build Technology
Data Integration & Quality
Business
Intelligence
Architect
Simple is what it needs in business
Top Business Intelligence Skills
SQL
Data Warehousing
Data Analysis
Tableau
ETL
23%
85%
28%
41%
65%
Mohtat@ut.ac.ir 20
28%
Top Business Intelligence Skills
Business Analyst
Oracle
SQL Server BI
Business Process
Data Modeling 17%
85%
19%
21%
22%
Mohtat@ut.ac.ir 21
19%
Top Business Intelligence Tools
Tableau Power BI Qlik
Your Choice Is Clear
Mohtat@ut.ac.ir 22
Big Data
Volume
Terabyte
Distribute
Big Table
Velocity
Real-time
Stream Processing
Variety
Structured
Unstructured
Text, Image, Video
Mohtat@ut.ac.ir 27
Big data is a term used to
refer to data sets that are
too large or complex for
traditional data-processing
application software to
adequately deal with.
It’s what organizations do
with the data that matters.
Big data can be analyzed
for insights that lead to
better decisions and
strategic business moves.
Hadoop Ecosystem
3 Types of Big Data Jobs
1 2
3
Big Data Developer
Big Data Administration
Big Data Analytics
Mohtat@ut.ac.ir 29
Top Big Data Programming Languages
Not only Hadoop, many other big data analysis tools like Storm,
Spark, and Kafka are written in Java and run on the JVM
Java
Python is a simple, open-source, general-purpose language.
Hence, it is easy to learn Python for anyone.. With its rich set
of utilities and libraries and easy-to-use features, it works
wonder for big data processing and analysis.
Python
Scala is a rival of Java and Python in the world of Data Science
and becoming more and more popular due to extensive use of
Apache Spark in Big data Hadoop industry.
Scala
Mohtat@ut.ac.ir 30
Pathway to Success
Success
Apache Hadoop
Apache Spark
Start
NoSQL Database
Data Analytics
Data Visualization
Mohtat@ut.ac.ir 31
Big Data Companies & Vendors
Cloudera, Inc. is a US-based
software company that
provides a software platform
for data engineering, data
warehousing, machine
learning and analytics that
runs in the cloud or on
premises
Cloudera
MapR is a business software
company headquartered in
Santa Clara, California. MapR
provides access to a variety of
data sources from a single
computer cluster, including big
data workloads
MapR
Hortonworks is a data software
company based in Santa Clara,
California that develops,
supports, and provides expertise
on a set of open-source software
designed to manage data and
processing for things such as IOT,
single view of X, and advanced
analytics and machine learning
Hortonworks
34
‫داده‬‫کالن‬ ‫زیرساخت‬ ‫اجرا‬ ‫و‬ ‫نصب‬
Mohtat@ut.ac.ir
35
‫داده‬‫کالن‬ ‫زیرساخت‬ ‫اجرا‬ ‫و‬ ‫نصب‬
Mohtat@ut.ac.ir
Big Data Specialization
Michigan University(Coursera)
Introduction to Big Data
Big Data Modeling and
Management Systems
Big Data Integration and Processing
Machine Learning With Big Data
Graph Analytics for Big Data
Mohtat@ut.ac.ir 36LOGO HERE
Apache Spark
Berkeley University
Mohtat@ut.ac.ir 37LOGO HERE
Big Data Book
38
Data Scientist VS Data Engineer
Mohtat@ut.ac.ir 40
Dolor sit ametis
Data Engineering
Data Scientist
Data Pipelines
Visualization & Storytelling
Programming
Modeling & Advance Analytics
Math & Statistics
System Implementation
How To Become A Data Engineer
Linux
NoSQL & SQL
Python / Java / Scala
Agile Development
Data Ingestion
Processing Frameworks
Mohtat@ut.ac.ir 42
Best Data Processing Frameworks
MapReduce is a programming model
and an associated implementation for
processing and generating big data
sets with a parallel, distributed
algorithm on a cluster
Apache Spark is an open-
source distributed
general-purpose cluster-
computing framework.
Apache Storm is a free
and open source
distributed realtime
computation system.
The core of Apache Flink
is a distributed streaming
dataflow engine written in
Java and Scala
43
Cassandra
Best NoSQL Database
Mohtat@ut.ac.ir 44
Data Ingestion Tools
Apache Kafka
SSIS & ODI
Apache NiFi
Logstash
Mohtat@ut.ac.ir 45
Mohtat@ut.ac.ir
https://www.linkedin.com/in/mohtat
https://www.t.me/DataAnalysis
Contact Us
Thank You

Data Skills for Digital Era-مهارت های داده ای

  • 1.
    Data Skills forDigital Era The Top Data Skills You Need To Get Hired
  • 2.
    Main Focus Data ScienceBusiness Intelligence Big Data Data Engineering Mohtat@ut.ac.ir 2
  • 4.
    Data Science Math &Statistics Computer Science Subject Matter Expertise Mohtat@ut.ac.ir 4 Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data, which is a continuation of some of the data analysis fields such as statistics, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD).
  • 5.
  • 6.
  • 7.
    Critical Skills forData Scientists Python R SQL Data Mining Tools Knime , ReapidMiner, IBM SPSS Modeler Excel BI Tools Tableau, Power BI, Qlik Mohtat@ut.ac.ir 9
  • 8.
    Top Python Librariesin Data Science TensorFlow “TensorFlow is an open source software library for numerical computation using data flow graphs. PyTorch “PyTorch is a Python package that provides Deep neural networks built on a tape-based autograd system Numpy “NumPy is the fundamental package needed for scientific computing with Python. Scikit-Learn “scikit-learn is a Python module for machine learning built on NumPy, SciPy and matplotlib. Keras “Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. Scipy “SciPy is open-source software for mathematics, science, and engineering. Pandas “pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive Matplotlib “Matplotlib is a Python 2D plotting library which produces publication- quality figures in a variety of hardcopy formats and interactive environments across platforms. Scrapy “Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. Mohtat@ut.ac.ir 10
  • 9.
    Top Skills everyData Scientist needs to Master TensorFlow Keras Hadoop Spark Hive Java Matlab Mohtat@ut.ac.ir 11
  • 10.
    Most Essential Skillsfor Data Scientists Complex Problem Solving Team Working Emotional Intelligence Creativity Critical Thinking Negotiation Mohtat@ut.ac.ir 12
  • 11.
    Applied Data Sciencewith Python Michigan University(Coursera) Basic Data Visualization Machine Learning Text Mining SNA Applied Text Mining in Python Introduction to Data Science in Python Applied Plotting, Charting & Data Representation in Python Applied Machine Learning in Python Applied Social Network Analysis in Python Mohtat@ut.ac.ir 13LOGO HERE
  • 12.
  • 14.
    Business Intelligence encompasses awide variety of tools, applications and methodologies that enable organizations to collect data from internal systems and external sources; prepare it for analysis; develop and run queries against that data; and create reports, dashboards and data visualizations to make the analytical results available to corporate decision-makers, as well as operational workers. BI Mohtat@ut.ac.ir 17 Business Skills Link to Business Strategy Define Priorities Define BI Vision Lead Organization / BPR Analytics Skills Data Mining Social BI IT Skills Infrastructure Build Technology Data Integration & Quality
  • 15.
  • 16.
    Top Business IntelligenceSkills SQL Data Warehousing Data Analysis Tableau ETL 23% 85% 28% 41% 65% Mohtat@ut.ac.ir 20 28%
  • 17.
    Top Business IntelligenceSkills Business Analyst Oracle SQL Server BI Business Process Data Modeling 17% 85% 19% 21% 22% Mohtat@ut.ac.ir 21 19%
  • 18.
    Top Business IntelligenceTools Tableau Power BI Qlik Your Choice Is Clear Mohtat@ut.ac.ir 22
  • 23.
    Big Data Volume Terabyte Distribute Big Table Velocity Real-time StreamProcessing Variety Structured Unstructured Text, Image, Video Mohtat@ut.ac.ir 27 Big data is a term used to refer to data sets that are too large or complex for traditional data-processing application software to adequately deal with. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
  • 24.
  • 25.
    3 Types ofBig Data Jobs 1 2 3 Big Data Developer Big Data Administration Big Data Analytics Mohtat@ut.ac.ir 29
  • 26.
    Top Big DataProgramming Languages Not only Hadoop, many other big data analysis tools like Storm, Spark, and Kafka are written in Java and run on the JVM Java Python is a simple, open-source, general-purpose language. Hence, it is easy to learn Python for anyone.. With its rich set of utilities and libraries and easy-to-use features, it works wonder for big data processing and analysis. Python Scala is a rival of Java and Python in the world of Data Science and becoming more and more popular due to extensive use of Apache Spark in Big data Hadoop industry. Scala Mohtat@ut.ac.ir 30
  • 27.
    Pathway to Success Success ApacheHadoop Apache Spark Start NoSQL Database Data Analytics Data Visualization Mohtat@ut.ac.ir 31
  • 28.
    Big Data Companies& Vendors Cloudera, Inc. is a US-based software company that provides a software platform for data engineering, data warehousing, machine learning and analytics that runs in the cloud or on premises Cloudera MapR is a business software company headquartered in Santa Clara, California. MapR provides access to a variety of data sources from a single computer cluster, including big data workloads MapR Hortonworks is a data software company based in Santa Clara, California that develops, supports, and provides expertise on a set of open-source software designed to manage data and processing for things such as IOT, single view of X, and advanced analytics and machine learning Hortonworks
  • 29.
  • 30.
  • 31.
    Big Data Specialization MichiganUniversity(Coursera) Introduction to Big Data Big Data Modeling and Management Systems Big Data Integration and Processing Machine Learning With Big Data Graph Analytics for Big Data Mohtat@ut.ac.ir 36LOGO HERE
  • 32.
  • 33.
  • 35.
    Data Scientist VSData Engineer Mohtat@ut.ac.ir 40 Dolor sit ametis Data Engineering Data Scientist Data Pipelines Visualization & Storytelling Programming Modeling & Advance Analytics Math & Statistics System Implementation
  • 36.
    How To BecomeA Data Engineer Linux NoSQL & SQL Python / Java / Scala Agile Development Data Ingestion Processing Frameworks Mohtat@ut.ac.ir 42
  • 37.
    Best Data ProcessingFrameworks MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster Apache Spark is an open- source distributed general-purpose cluster- computing framework. Apache Storm is a free and open source distributed realtime computation system. The core of Apache Flink is a distributed streaming dataflow engine written in Java and Scala 43
  • 38.
  • 39.
    Data Ingestion Tools ApacheKafka SSIS & ODI Apache NiFi Logstash Mohtat@ut.ac.ir 45
  • 41.