This document discusses different profiles in analytics and data science. It describes the roles of data scientists, machine learning engineers, data miners, predictive modelers, statisticians, industrial statisticians, actuaries, data engineers, business intelligence analysts, and data analysts. It provides examples of industries and tasks involved in each role. It also compares data scientists and data analysts, outlining their different responsibilities and focus on past vs future analysis. Finally, it discusses the growth of analytics salaries and skills in India from 2013 to the present.
History of Data Mining and Big Data …
What is the Big Data ?
What are the real life dimensions for Big Data ?
How to use Big Data for STEM and INFONOMICS?
Analytical Case studies and tools using Big Data fintech examples
What is the future of Data Science ?
Data Analytics with R, Contents and Course materials, PPT contents. Developed by K K Singh, RGUKT Nuzvid.
Contents:
Introduction to Data, Information and Data Analytics,
Types of Variables,
Types of Analytics
Life cycle of data analytics.
History of Data Mining and Big Data …
What is the Big Data ?
What are the real life dimensions for Big Data ?
How to use Big Data for STEM and INFONOMICS?
Analytical Case studies and tools using Big Data fintech examples
What is the future of Data Science ?
Data Analytics with R, Contents and Course materials, PPT contents. Developed by K K Singh, RGUKT Nuzvid.
Contents:
Introduction to Data, Information and Data Analytics,
Types of Variables,
Types of Analytics
Life cycle of data analytics.
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
Data Analytics for R Course: https://www.edureka.co/r-for-analytics
This Edureka Tutorial on Data Analytics for Beginners will help you learn the various parameters you need to consider while performing data analysis.
The following are the topics covered in this session:
Introduction To Data Analytics
Statistics
Data Cleaning and Manipulation
Data Visualization
Machine Learning
Roles, Responsibilities and Salary of Data Analyst
Need of R
Hands-On
Statistics for Data Science: https://youtu.be/oT87O0VQRi8
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
As per Gartner, global revenue in the business intelligence (BI) and analytics software market is forecast to reach $18.3 billion in 2017, an increase of 7.3 percent from 2016, according to the latest forecast from Gartner, Inc. By the end of 2020, the market is forecast to grow to $22.8 billion.
Applications of Data Science in Drug Discovery, Financial Services, Project Management, Human Resources and Marketing.
By Dr. Laila Alabidi at the JOSA Data Science Meetup on 17/8/2019.
Presentación Ciro Cattuto, ISI Foundation en VI Summit País Digital 2018PAÍS DIGITAL
Exposición “ Data Science for private and public good” de Ciro Cattuto, Scientific Director, ISI Foundation, en el marco del VI Summit País Digital 2018, realizado el 4 y 5 de septiembre en Santiago, Chile.
Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data.
According to Inmon, a data warehouse is a subject oriented,
integrated, time-variant, and non-volatile collection of data. He defined the terms
in the sentence as follows:
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
Today, data science is enabling companies, governments, research centres and other organisations to turn their volumes of big data into valuable and actionable insights. It is important to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. According to the McKinsey Global Institute, the U.S. alone could face a shortage of about 190,000 data scientists and 1.5 million managers and analysts who can understand and make decisions using big data by 2018. In coming years, data scientists will be vital to all sectors —from law and medicine to media and nonprofits. Has the African continent planned to train the next generation of data scientists required on the continent?
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
Data Analytics for R Course: https://www.edureka.co/r-for-analytics
This Edureka Tutorial on Data Analytics for Beginners will help you learn the various parameters you need to consider while performing data analysis.
The following are the topics covered in this session:
Introduction To Data Analytics
Statistics
Data Cleaning and Manipulation
Data Visualization
Machine Learning
Roles, Responsibilities and Salary of Data Analyst
Need of R
Hands-On
Statistics for Data Science: https://youtu.be/oT87O0VQRi8
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
As per Gartner, global revenue in the business intelligence (BI) and analytics software market is forecast to reach $18.3 billion in 2017, an increase of 7.3 percent from 2016, according to the latest forecast from Gartner, Inc. By the end of 2020, the market is forecast to grow to $22.8 billion.
Applications of Data Science in Drug Discovery, Financial Services, Project Management, Human Resources and Marketing.
By Dr. Laila Alabidi at the JOSA Data Science Meetup on 17/8/2019.
Presentación Ciro Cattuto, ISI Foundation en VI Summit País Digital 2018PAÍS DIGITAL
Exposición “ Data Science for private and public good” de Ciro Cattuto, Scientific Director, ISI Foundation, en el marco del VI Summit País Digital 2018, realizado el 4 y 5 de septiembre en Santiago, Chile.
Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data.
According to Inmon, a data warehouse is a subject oriented,
integrated, time-variant, and non-volatile collection of data. He defined the terms
in the sentence as follows:
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
Today, data science is enabling companies, governments, research centres and other organisations to turn their volumes of big data into valuable and actionable insights. It is important to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. According to the McKinsey Global Institute, the U.S. alone could face a shortage of about 190,000 data scientists and 1.5 million managers and analysts who can understand and make decisions using big data by 2018. In coming years, data scientists will be vital to all sectors —from law and medicine to media and nonprofits. Has the African continent planned to train the next generation of data scientists required on the continent?
Top Rated Dissertation Data Analysis Services | PhD AssistancePHDAssistance2
Data Analytics is the keystone of transformative technologies like Artificial Intelligence (AI) and Machine Learning (ML). In the realm of AI and ML applications, data-driven insights empower businesses and researchers to make informed decisions, unravel patterns, and predict future trends.
For complete dissertation by statistics solution, visit - https://shorturl.at/oMSXY
Check our site to know more about real-time data analytics examples - https://shorturl.at/oszJ6
For #Enquiry:
Email: info@phdassistance.com
India: +91 91769 66446
UK: +44 7537144372
Real-time data analytics analyses data as it’s generated or received, providing immediate insights and actionable information. Unlike traditional batch processing, which deals with data in fixed intervals, real-time data source analytics operate on a continuous data stream
For machine learning project proposal, visit - https://www.phdassistance.com/services/phd-data-analysis/quantitative-confirmatory-analysis/
Check our site to know more about ai applications examples - https://www.phdassistance.com/services/phd-data-analysis/
For #Enquiry:
Email: info@phdassistance.com
India: +91 91769 66446
UK: +44 7537144372
Predictive Analytics: Context and Use Cases
Historical context for successful implementation of predictive analytic techniques and examples of implementation of successful use cases.
Data and Analytics Career Paths, Presented at IEEE LYC'19.
About Speaker:
Ahmed Amr is a Data/Analytics Engineer at Rubikal, where he leads, develops, and creates daily data/analytics operations, which includes data ingestion , data streaming, data warehousing, and analytical dashboards. Ahmed is graduated from Computer Engineering Department, Alexandria University; and he is currently pursuing his MSc degree in Computer Science, AAST. Professionally, Ahmed worked with Egyptian/US startups such as (Badr, Incorta, WhoKnows) to develop their data/analytics projects. Academically, Ahmed worked as a Teaching Assistant in CS department, AAST. Ahmed helps software companies to develop robust data engineering infrastructure, and powerful analytical insights.
References:
1) https://www.datacamp.com/community/tutorials/data-science-industry-infographic
2) Analytics: The real-world use of big data, IBM, Executive Report
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
Google “citizen data scientist” today and you will see about 1M results. That number is data. It may be interesting, but it is meaningless without context. Sometimes it appears that we are drowning in data from systems and sensors but starving for insights. We definitely produce more of the former than the latter, which has created demand for more powerful tools to simplify the process and lower the skills requirement for analysis. As vendors build systems to meet this demand, we hear about the coming ”democratization” of big data as more people at varying levels within organizations are empowered to find meaning and improve their own performance with data-driven insights. This is a good thing, but it does require caution.
To paraphrase Col Jessup in A Few Good Men: You want answers? You can’t handle the data.
In this webinar, we will survey emerging approaches to simplifying analysis, and discuss the benefits, dangers, and skills required for individuals and organizations to thrive in the brave new world of analytics everywhere, for everyone.
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...Experfy
Gartner, IBM, Accenture and many others have asserted that 80% or more of the world’s information is unstructured – and inherently hard to analyze. What does that mean? And what is required to extract insight from unstructured data?
Unstructured data is infinitely variable in quality and format, because it is produced by humans who can be fastidious, unpredictable, ill-informed, or even cynical, but always unique, not standard in any way. Recent advances in natural language processing provides the notion that unstructured content can be included in data analysis.
Serious growth and value companies are committed to data. The exponential growth of Big Data has posed major challenges in data governance and data analysis. Good data governance is pivotal for business growth.
Therefore, it is of paramount importance to slice and dice Big Data that addresses data governance and data analysis issues. In order to support high quality business decision making, it is important to fully harness the potential of Big Data by implementing proper Data Migration, Data Ingestion, Data Management, Data Analysis, Data Visualization and Data Virtualization tools.
Check it out: https://www.experfy.com/training/courses/march-towards-big-data-big-data-implementation-migration-ingestion-management-visualization
Just finished a basic course on data science (highly recommend it if you wish to explore what data science is all about). Here are my takeaways from the course.
Large corporations have to master vast amounts of heterogeneous data in order to stay competitive. While existing approaches have attempted to consolidate and manage the data by forcing it into a single shared data model, data lakes recently emerged that instead provide a central storage point for holding all data sets in their original form.
In this talk, we present eccenca CorporateMemory, which extends the data lake paradigm with a semantic integration layer for managing diverse, but semantically enriched data. eccenca CorporateMemory builds an extensible knowledge graph that employs RDF vocabularies for transforming and linking multiple datasets in order to generate an integrated semantic understanding of the data.
Robert Isele | Head of Data Integration Unit at eccenca GmbH
Presentation at Semantics 2016 in Leipzig in the context with the results of the LEDS project
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
Keynote at Sixth International Workshop on Cloud Data Management CloudDB 2014 Chicago March 31 2014.
Abstract: We introduce the NIST collection of 51 use cases and describe their scope over industry, government and research areas. We look at their structure from several points of view or facets covering problem architecture, analytics kernels, micro-system usage such as flops/bytes, application class (GIS, expectation maximization) and very importantly data source.
We then propose that in many cases it is wise to combine the well known commodity best practice (often Apache) Big Data Stack (with ~120 software subsystems) with high performance computing technologies.
We describe this and give early results based on clustering running with different paradigms.
We identify key layers where HPC Apache integration is particularly important: File systems, Cluster resource management, File and object data management, Inter process and thread communication, Analytics libraries, Workflow and Monitoring.
See
[1] A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures, Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha and Geoffrey Fox, accepted in IEEE BigData 2014, available at: http://arxiv.org/abs/1403.1528
[2] High Performance High Functionality Big Data Software Stack, G Fox, J Qiu and S Jha, in Big Data and Extreme-scale Computing (BDEC), 2014. Fukuoka, Japan. http://grids.ucs.indiana.edu/ptliupages/publications/HPCandApacheBigDataFinal.pdf
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. Different Profiles of Analytics
DATA SCIENCE
• Data scientist, chief scientist, senior analyst, director of analytics, Etc.
• Industries like Digital analytics, search technology, marketing, fraud detection,
astronomy, energy, Healthcare, social networks, finance, forensics, security
(NSA), mobile, telecommunications, weather forecasts, and fraud detection.
• Projects Taxonomy creation (text mining, big data), clustering applied to big
data sets, recommendation engines, simulations, rule systems for statistical
scoring engines, root cause analysis, automated bidding, forensics, exo-planets
detection, and early detection of terrorist activity or pandemics.
• Main components are Machine to Machine communications, Automation.
• Overlaps with Computer Science, Statistics, Machine Learning, Data Mining,
Operational Research, Business Intelligence.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
3. Different Profiles of Analytics
MACHINE LEARNING - Very popular computer science
discipline
• Part of data science and closely related to Data Mining. Machine learning is
about designing algorithms (like data mining), but emphasis is on prototyping
algorithms for production mode, and designing automated systems
• Python is now a popular language for ML development Projects.
• Core algorithms include clustering and supervised classification, rule systems,
and scoring techniques
• A sub-domain, close to Artificial Intelligence is deep learning.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
4. Different Profiles of Analytics
DATA MINING
• Designing algorithms to extract insights from rather large and potentially
unstructured data (text mining), sometimes called Nugget Discovery.
• Techniques include pattern recognition, feature selection, clustering,
supervised classification and encompasses a few statistical techniques.
• Data mining thus have some intersection with statistics, and it is a subset of
Data science.
• Data miners use open source and software such as Rapid Miner.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
5. Different Profiles of Analytics
PREDICTIVE MODELING
• Predictive modeling projects occur in all industries across all disciplines.
• Aim at predicting future based on past data, usually but not always based on
statistical modeling.
• Predictions often come with confidence intervals.
• Roots of predictive modeling are in statistical science.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
6. Different Profiles of Analytics
STATISTICS
• Loosing ground to data science, industrial statistics, operations research, data
mining, machine learning -- where the same clustering, cross-validation and
statistical training techniques are used, albeit in a more automated way and on
bigger data.
• Many professionals who were called statisticians 10 years ago, have seen their
job title changed to data scientist or analyst in the last few years.
• Modern sub-domains include statistical computing, statistical learning(closer to
machine learning), computational statistics(closer to data science), data-driven
(model-free) inference, sport statistics, and Bayesian statistics
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
7. Different Profiles of Analytics
INDUSTRIAL STATISTICS
• Statistics frequently performed by non-statisticians (engineers with good
statistical training), working on engineering projects such as yield optimization
or load balancing (system analysts). They use very applied statistics, and their
framework is closer to six sigma, quality control and operations research, than
to traditional statistics.Also found in oil and manufacturing industries.
• Techniques used include time series, ANOVA, experimental design, survival
analysis, signal processing(filtering, noise removal, deconvolution), spatial
models, simulation, Markov chains, risk and reliability models.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
8. Different Profiles of Analytics
ACTUARIAL SCIENCES
• Just a subset of statistics focusing on insurance (car, health, etc.)
• using survival models: predicting when you will die, what your health
expenditures will be based on your health status (smoker, gender, previous
diseases) to determine your insurance premiums.
• Also predicts extreme floods and weather events to determine premiums.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
9. Different Profiles of Analytics
HPC
• High performance computing, not a discipline per se, but should be of concern
to data scientists, big data practitioners, computer scientists and
mathematicians, as it can redefine the computing paradigms in these fields.
• HPC should not be confused with Hadoop and Map-Reduce: HPC is hardware-
related, Hadoop is software-related (though heavily relying on Internet
bandwidth and servers configuration and proximity).
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
10. Different Profiles of Analytics
OPERATIONS RESEARCH(OR)
• It is about decision science and optimizing traditional business projects:
inventory management, supply chain, pricing. They heavily use Markov Chain
models, Monter-Carlo simulations, queuing and graph theory, and software
such as AIMS, Matlab or Informatica.
• Big, traditional old companies use OR, new and small ones (start-ups) use data
science to handle pricing, inventory management or supply chain problems.
• Car traffic optimization is a modern example of OR problem, solved with
simulations, commuter surveys, sensor data and statistical modeling.
• OR has a significant overlap with six-sigma, also solves econometric problems,
and has many practitioners/applications in the army and defense sectors
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
11. Different Profiles of Analytics
ECONOMETRICS
• Econometrics is heavily statistical in nature, using time series models such as
auto-regressive processes.
• Also overlapping with operations research (itself overlapping with statistics!)
and mathematical optimization (simplex algorithm).
• Econometricians like ROC and efficiency curves.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
12. Different Profiles of Analytics
DATA ENGINEERING
• Performed by software engineers (developers) or architects (designers) in large
organizations (sometimes by data scientists in tiny companies)
• A sub-domain currently under attack is data warehousing, as this term is
associated with static, siloed conventational data bases, data architectures, and
data flows, threatened by the rise of NoSQL, NewSQL and graph databases.
• Transforming these old architectures into new ones (only when needed) or
make them compatible with new ones, is a lucrative business
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
13. Different Profiles of Analytics
BUSINESS INTELLIGENCE
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
14. Different Profiles of Analytics
BUSINESS INTELLIGENCE
• Focuses on dashboard creation, metric selection, producing and scheduling data
reports (statistical summaries) sent by email or delivered/presented to
executives, competitive intelligence (analyzing third party data), as well as
involvement in database schema design (working with data architects) to collect
useful, actionable business data efficiently.
• Typical job title is Business Analyst
• some are more involved with marketing, product or finance (forecasting sales and
revenue).
• Some have learned advanced statistics such as time series, but most only use
(and need) basic stats, and light analytics, relying on IT to maintain databases
and harvest data.
• BI and market research(but not competitive intelligence) are currently
experiencing a decline.
• Part of the decline is due to not adapting to new types of data (e.g. unstructured text)
that require engineering or data science techniques to process and extract value
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
15. Different Profiles of Analytics
DATA ANALYTICS
• This is the new term for Business Statistics since at least 1995, and it covers a
large spectrum of applications including fraud detection, advertising mix
modeling, attribution modeling, sales forecasts, cross-selling optimization
(retails), user segmentation, churn analysis, computing long-time value of a
customer and cost of acquisition, Etc.
• Except in big companies, data analyst is a Junior role; these practitioners have
a much more narrow knowledge and experience than data scientists
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
16. Different Profiles of Analytics
BUSINESS ANALYTICS
• Same as data analysis, but restricted to business problems only.
• Tends to have a bit more of a financial, marketing or ROI flavor.
Resource: http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
17. DATA SCIENTIST vs. DATA ANALYST
Resource: http://www.edureka.co/blog/difference-between-data-scientist-and-data-analyst/
18. DATA SCIENTIST vs. DATA ANALYST
• “Data Analyst” focuses on the movement and interpretation of
data, typically with a focus on the past and present.
• Alternatively, a “Data Scientist” may be primarily responsible for
summarizing data in such a way as to provide forecasting, or an
insight into future based on the patterns identified from past and
current data.
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
19. DATA SCIENTIST vs. DATA ANALYST
CRISP-DM Process Model
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
20. DATA SCIENTIST vs. DATA ANALYST
• Business understanding – Determine Business Objectives, Assess Situation,
Determine Data Mining Goals, Produce Project Plan
• Data understanding – Collect Initial Data, Describe Data, Explore Data,Verify
Data Quality
• Data preparation – Select Data, Clean Data, Construct Data, Integrate Data
• Modeling – Select modeling technique, Generate Test Design, Build Model, Assess
Model
• Evaluation - Evaluate Results, Review Process, Determine Next Steps
• Deployment – Plan Deployment, Plan Monitoring and Maintenance, Produce
Final Report, Review Project
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
21. DATA SCIENTIST vs. DATA ANALYST
• A Data Scientist is often heavily involved in the cleaning and manipulation
of data to support their modeling needs as well as the building and
evaluating of model designs which are intended to help guide changes in
business decisions.
• On the other hand, a Data Analyst may spend their time exploring data to
support troubleshooting efforts or to generate ideas for useful reports to
pitch to the customer.
• In general, while Data Analysts tend to be more Business focused, Data
Scientists are often Mathematically focused.
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
22. DATA SCIENTIST vs. DATA ANALYST
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
23. DATA SCIENTIST vs. DATA ANALYST
• Data Analysts typically perform data migration and visualization roles that
focus on describing the past; while Data Scientists typically perform roles
manipulating data and creating models to improve the future.
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
24. DATA SCIENTIST vs. DATA ANALYST
• Situation:
• A large provider of streaming entertainment and data services wants to
improve call center performance and extract tactical business value from
call center data
• Project Domain:
• Logged performance data from the firm’s proprietary hardware platform
and call center data tied to specific customers and device IDs
• In the above case, the Data Analyst and Scientist would both use data
but, in different ways. The Data Analyst would be concerned with
reporting metrics, such as average call time; while the Data Scientist
would be concerned with using the historical data to predict the future,
such as predicting future months call volumes. Both roles are equally
important to the operation of the call center and help find solutions for
the center to run smoothly. The figure below details some solutions each
role creates.
Resource: http://captechconsulting.com/blog/richard-rivera/data-scientist-vs-data-analyst
26. Current Analytics Scenario
•2013
• Significant rise in Data Analytics and Big Data initiatives across all sectors
in India. Finance, telecom, ecommerce and retail sectors showed higher
investments in data related tools and technologies.
• Industries like healthcare, auto and manufacturing also increased their
data related spend.
• Hiring picked up considerably but the demand supply lag was still
considerable considering the sheer lack of availability of trained analytics
professionals
28. So…..
Tech
Term
Stat/
OR
Source of
Data
Size of
data
Typical Software
DataAnalysis May be Manual
Usually
Small
SPSS, SYSTAT etc.
Business
Intelligence
No
Business
process
Usually
Large
Business Objects, Micro
strategy, Pentaho etc.
Data Mining May be
Business
process
Very
Large
Clementine, e-miner etc.
Analytics Yes
Business
process
Large SAS, R etc.
• Hence, a hallmark of Analytics is application of statistics/OR in industry
setting using business process data.
• SAS is the widely used tool. R (open source) is FAST growing
30. Current Analytics Scenario
•2013 – Key Facts
• Though SAS remained a popular tool, trends show that R has been
increasing its share of the market. Several analytics companies use both
SAS and R, depending on project and client preferences.
• Big Data tools Hadoop and Map Reduce were industry favourites.
Companies are investing in building this capability , but are not using it
effectively yet.
• Hadoop skills were in demand coupled with SAS and R.
• Web analytics, text analytics and social media analytics began to gain
popularity
31. Current Analytics Scenario
•2014
• Demand for analytics is currently driven by businesses and
organizations that want to up-skill their employees. We predict that
in 2014, MBA colleges will place more emphasis on analytics as part
of their regular curriculum to cater to the demand-supply gap for
analytics professionals.
• A number of specialized analytics courses (such as the ones offered
by the Great Lakes Institute of Management, Chennai and the
Indian School of Business , Hyderabad) have already gained
popularity in 2013. We predict that many more such courses will be
launched in 2014.
36. Current Analytics Salary
• Average entry level salaries have increased 27% since 2013, from Rs.
520 thousand to Rs 660 thousand per annum.
• Typically, there is a 250% increase in salary from entry level analyst
to manager.
• Managers in analytics command an annual salary upward of Rs 1500
thousand.
• At senior levels, annual salaries are upward of Rs. 2500 thousand
which is more than a 60% increase from a managers salary
37. Let’s Conclusion for today
• Salaries will continue to increase and we will see professionals from
other sectors honing their analytics skills and switching careers.
• The scope of analytics will so permeate India Inc., that 15 years from
now, we predict that those professionals with no analytics and big
data skills will have no scope for growth.
• This seems harsh, but that is how dynamic data analytics is and how
pivotal data backed decision making will become.
38. In God,WeTrust... All Others must bring the "Data"
Srikanth Ayithy
about.me/srikanthayithy