The samples were collected by asking respondents to fill in a survey created by AIM about what tools and techniques data scientists use at work. This included various sub-topics such as data visualisation tools, preferred operating systems and programming languages, among others. We took opinions from all those who practice data science — from professionals with less than two years of experience to CXOs — to get a thorough idea of the working environment in this growing field.
Our survey was met with much enthusiasm — and we got some great insights from it. Some of them were expected, and many of them were real eye-openers.
SGCI - The Science Gateways Community Institute: International Collaboration ...Sandra Gesing
Science gateways - also called virtual research environments or virtual labs - allow science and engineering communities to access shared data, software, computing services, instruments, and other resources specific to their disciplines. The US Science Gateways Community Institute (SGCI), opened in August 2016, provides free resources, services, experts, and ideas for creating and sustaining science gateways. It offers five areas of services to the science gateway developer and user communities: the Incubator, Extended Developer Support, the Scientific Software Collaborative, Community Engagement and Exchange, and Workforce Development. While all these services are available to US-based communities, the Incubator, the Scientific Software Collaborative and the Community Engagement and Exchange serve also the international communities. SGCI aims at supporting beyond borders on international scale with diverse measures and to form and deepen collaborations with partner organizations and coalitions beneficial and/or related to the science gateways community. Research topics are independent of national borders and researchers spread worldwide can benefit from each other’s research results, software, data and from lessons learned — via online materials and publications or at international events. The gateway community has benefitted from this type of exchange for years and one mission of SGCI is to support the international community. This talk will present related work describing the benefits of international collaborations generally, and specifically as they relate to science gateways. It will go into detail regarding SGCI’s ongoing work on an international scale and SGCI's work planned in the near future to foster collaborations under consideration of challenges such as different timezones and long distances between collaborators.
Slides chase 2019 connected health conference - thursday 26 september 2019 -...Amélie Gyrard
Paper: IAMHAPPY: Towards An IoT Knowledge-Based Cross-Domain Well-Being Recommendation System for Everyday Happiness
IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE) Conference
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
Data Science Tech Institute - Big Data and Data Science Conference around Dr Gregory Piatetsky-Shapiro.
Keynote - An overview on Big Data & Data Science Dr Gregory Piatetsky-Shapiro - KDnuggets.com Founder & Editor.
Paris May 23rd & Nice May 26th 2016 @ Data ScienceTech Institute (https://www.datasciencetech.institute/)
Data Science Skills Study 2019 by AIM And Imarticus LearningPraj H
Our latest Data Science Skills Study 2019 by Analytics India Magazine and Imarticus Learning takes a deeper look into the key trends related to tools and technologies deployed across the sectors and how companies are staying ahead of the pack. As analytics and machine learning reaches deeper into operations, there is significant disruption at the workplace with data scientists and data analysts dabbling with newer tools.
The presentation is about the career path in the field of Data Science. Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
This document provides an introduction to data science. It defines data science as a multi-disciplinary field that uses scientific methods and processes to extract knowledge and insights from structured and unstructured data. The document discusses the importance and impact of data science on organizations and society. It also outlines common applications of data science and the roles and skills required for a career in data science.
This document provides an introduction to data science. It defines data science as a multi-disciplinary field that uses scientific methods and processes to extract knowledge and insights from structured and unstructured data. The document discusses the importance and impact of data science on organizations and society. It also outlines common applications of data science and the roles and skills required for a career in data science.
Why is Python becoming the language of choice for Data Analysts?Rang Technologies
Python is rapidly emerging as the go-to language for data analysts, poised to become the ultimate powerhouse in the realm of data science within the next five years. Renowned as the "anaconda" of programming languages, Python boasts open-source accessibility and an object-oriented structure, seamlessly integrating with a vast array of libraries to simplify engineers' tasks. To know more about Python languages visit here: https://www.rangtech.com/blog/python-language-the-anaconda-in-data-science
SGCI - The Science Gateways Community Institute: International Collaboration ...Sandra Gesing
Science gateways - also called virtual research environments or virtual labs - allow science and engineering communities to access shared data, software, computing services, instruments, and other resources specific to their disciplines. The US Science Gateways Community Institute (SGCI), opened in August 2016, provides free resources, services, experts, and ideas for creating and sustaining science gateways. It offers five areas of services to the science gateway developer and user communities: the Incubator, Extended Developer Support, the Scientific Software Collaborative, Community Engagement and Exchange, and Workforce Development. While all these services are available to US-based communities, the Incubator, the Scientific Software Collaborative and the Community Engagement and Exchange serve also the international communities. SGCI aims at supporting beyond borders on international scale with diverse measures and to form and deepen collaborations with partner organizations and coalitions beneficial and/or related to the science gateways community. Research topics are independent of national borders and researchers spread worldwide can benefit from each other’s research results, software, data and from lessons learned — via online materials and publications or at international events. The gateway community has benefitted from this type of exchange for years and one mission of SGCI is to support the international community. This talk will present related work describing the benefits of international collaborations generally, and specifically as they relate to science gateways. It will go into detail regarding SGCI’s ongoing work on an international scale and SGCI's work planned in the near future to foster collaborations under consideration of challenges such as different timezones and long distances between collaborators.
Slides chase 2019 connected health conference - thursday 26 september 2019 -...Amélie Gyrard
Paper: IAMHAPPY: Towards An IoT Knowledge-Based Cross-Domain Well-Being Recommendation System for Everyday Happiness
IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE) Conference
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
Data Science Tech Institute - Big Data and Data Science Conference around Dr Gregory Piatetsky-Shapiro.
Keynote - An overview on Big Data & Data Science Dr Gregory Piatetsky-Shapiro - KDnuggets.com Founder & Editor.
Paris May 23rd & Nice May 26th 2016 @ Data ScienceTech Institute (https://www.datasciencetech.institute/)
Data Science Skills Study 2019 by AIM And Imarticus LearningPraj H
Our latest Data Science Skills Study 2019 by Analytics India Magazine and Imarticus Learning takes a deeper look into the key trends related to tools and technologies deployed across the sectors and how companies are staying ahead of the pack. As analytics and machine learning reaches deeper into operations, there is significant disruption at the workplace with data scientists and data analysts dabbling with newer tools.
The presentation is about the career path in the field of Data Science. Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
This document provides an introduction to data science. It defines data science as a multi-disciplinary field that uses scientific methods and processes to extract knowledge and insights from structured and unstructured data. The document discusses the importance and impact of data science on organizations and society. It also outlines common applications of data science and the roles and skills required for a career in data science.
This document provides an introduction to data science. It defines data science as a multi-disciplinary field that uses scientific methods and processes to extract knowledge and insights from structured and unstructured data. The document discusses the importance and impact of data science on organizations and society. It also outlines common applications of data science and the roles and skills required for a career in data science.
Why is Python becoming the language of choice for Data Analysts?Rang Technologies
Python is rapidly emerging as the go-to language for data analysts, poised to become the ultimate powerhouse in the realm of data science within the next five years. Renowned as the "anaconda" of programming languages, Python boasts open-source accessibility and an object-oriented structure, seamlessly integrating with a vast array of libraries to simplify engineers' tasks. To know more about Python languages visit here: https://www.rangtech.com/blog/python-language-the-anaconda-in-data-science
An introductory but highly practical talk on starting a Data Science career and life. It touches upon all the main aspects of the path towards becoming a Data scientist, also seen through a personal development perspective. Moreover, we talk about the role that a data scientist ultimately fulfills - as an individual or as a team - in the technology innovation life cycle and the product life-cycle.
This document provides an introduction to data science, including definitions of data science, its impact and importance. It discusses how data science affects organizations and provides competitive advantages. Examples of data science applications are given across various domains like banking, healthcare, transportation and more. The document also outlines the road to becoming a data scientist and what skills are required, such as learning to code, mathematics, machine learning techniques and software engineering. In summary, data science uses scientific methods to extract knowledge and insights from data, it benefits society in areas like healthcare, transportation and environment, and becoming a data scientist requires strong coding and analytical skills.
This presentation was provided by Daniella Lowenberg of the California Digital Library during the NISO Virtual Conference, Advancing Altmetrics, held on Wednesday, December 13, 2017.
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
The document provides an overview of the Open Research Data Pilot, the data management plan, and OPENAIRE tools and services to support implementation of FAIR data management plans. It discusses the aims of the Open Research Data Pilot, which Horizon 2020 projects are required to participate, and the types of data that must be deposited. It also covers topics like creating a data management plan, selecting a repository, making data FAIR, and OPENAIRE support resources like briefing papers, webinars, and the Zenodo repository.
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
The document provides an overview of the Open Research Data Pilot, the data management plan, and OPENAIRE tools and services to support implementation of FAIR data management plans. It discusses the aims of the Open Research Data Pilot, which Horizon 2020 projects are required to participate, and the types of data that must be deposited. It also covers topics like creating a data management plan, selecting a repository, making data FAIR, and OPENAIRE support resources like briefing papers, webinars, and the Zenodo repository.
FAIR for the future: embracing all things dataARDC
FAIR for the future: embracing all things data - Natasha Simons, Keith Russell and Liz Stokes, presented at Taylor & Francis Scholarly Summits in Sydney 11 Feb 2019 and Melbourne 14 Feb 2019.
PAARL's 1st Marina G. Dayrit Lecture Series held at UP's Melchor Hall, 5F, Proctor & Gamble Audiovisual Hall, College of Engineering, on 3 March 2017, with Albert Anthony D. Gavino of Smart Communications Inc. as resource speaker on the topic "Using Big Data to Enhance Library Services"
Analytic Transformation | 2013 Loras College Business Analytics SymposiumCartegraph
The document summarizes key points from a 2013 analytics symposium. It discusses trends in big data discovery, mobility, real-time decisions, and predictive analytics. Big data allows tapping diverse data sets to find unknown relationships and make data-driven decisions. It impacts many industries. Real-time data and decisions are important as over 80% of executives say critical information is delivered too late. Predictive analytics and visualization help add meaning to data. Mobility increases access and analytical collaboration anywhere.
This document provides an overview of data science including its importance, what data scientists do, how the field has emerged, and how to become a data scientist. It notes that by 2018 the US could face shortages of people with data analytics skills. It then discusses how LinkedIn's early growth in 2006 exemplifies the data science process of framing questions, collecting and processing data, exploring patterns, and communicating results. Finally, it outlines the tools used in data science like SQL, analytics software, and machine learning and discusses getting started in the field through education, curiosity, and ongoing learning with mentorship support.
Getting started in Data Science (April 2017, Los Angeles)Thinkful
The document discusses the rise of data science and the skills needed for data scientists. It defines data science as the intersection of engineering, statistics, and communication. Data scientists analyze large datasets to answer important business questions. The document uses LinkedIn in 2006 as a case study, outlining how a data scientist there framed questions, collected and processed user data, explored patterns, and communicated results to improve the user experience and growth. It highlights tools like SQL, analytics software, and machine learning that data scientists use and stresses the importance of curiosity, technical skills, and strong communication for those interested in the field.
Speaker: Franz Walder, Product Manager, panagenda
Abstract: panagenda reached out to 750+ professionals to share their company’s Domino application strategy. Join this session to find out what was most important to your peers and what challenges they had to overcome to make their project a success. Find out about the critical questions everybody should ask and have answers to throughout their project. Franz Walder presents the exciting results of the survey and explains what role analytics can play when tackling these challenges.
This document provides an overview of data science including its importance, what data scientists do, how the field has emerged, and how to become a data scientist. It discusses how data science can help answer important business questions using LinkedIn in 2006 as a case study. It also outlines the typical data science process of framing questions, collecting and cleaning data, exploring patterns, and communicating results. Finally, it introduces some common data science tools like SQL, analytics software, and machine learning algorithms and discusses options for continuing education in data science.
panagenda reached out to 750+ professionals to share their company’s Domino application strategy. Join this session to find out what was most important to your peers and what challenges they had to overcome to make their project a success. Franz Walder presents the exciting results of the survey and explains what role analytics can play when tackling these challenges.
This document outlines a 10 step framework for developing data science applications. It begins with articulating the business problem and data questions. Next steps include developing a data acquisition and preparation strategy, exploring and formatting the data, defining the goal, and shortlisting techniques. Later steps evaluate constraints, establish evaluation criteria, fine tune algorithms, and plan for deployment and monitoring. The document also provides background on the speaker and organization. They offer data science, quant finance, and machine learning programs and consulting using Python, R, and MATLAB on their online sandbox platform.
Workshop 1. Architecting Innovative Graph Applications
Join this hands-on workshop for beginners led by Neo4j experts guiding you to systematically uncover contextual intelligence. Using a real-life dataset we will build step-by-step a graph solution; from building the graph data model to running queries and data visualization. The approach will be applicable across multiple use cases and industries.
This document discusses why Python is a popular programming language for data science. It notes that Python has a clean syntax, expansive library, and large user base. Additionally, major companies like Google use Python for various applications. The document also provides examples of what businesses use Python for, including building data pipelines, descriptive analytics, machine learning, and data science tasks like clustering and prediction. Finally, it outlines some common tools and processes used in working as a data analyst or scientist, such as cleaning, reshaping, and analyzing data in Python.
Moving Beyond Batch: Transactional Databases for Real-time DataVoltDB
Join guest Forrester speaker, Principal Analyst Mike Gualtieri, and Dennis Duckworth Director of Product Marketing from VoltDB to learn how enterprises can create a real-time, “origin-zero” data architecture within transactional databases to become a real-time enterprise.
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j
The document outlines an agenda for a Neo4j Graph Day event including sessions on connected data, graphs and artificial intelligence, a lunch break, Neo4j training, and a reception. Key topics include Neo4j in production environments, its role in boosting artificial intelligence, and training opportunities.
Keynote: Graphs in Government_Lance Walter, CMONeo4j
This document contains an agenda and presentation slides for a Neo4j Graphs in Government event. The presentation introduces graph databases and Neo4j, discusses how graphs can help solve network-oriented problems, provides examples of graph use cases in various industries, and highlights new features in Neo4j 4.0 like easy management, unlimited scaling, and granular security. Case studies demonstrate how Neo4j has helped organizations like the US Army, MITRE, Adobe, and the German Center for Diabetes Research tackle complex data challenges.
The agenda of the talk has been broken down into two parts: 1. Query Understanding [30mins]: [Sonu Sharma] • NLP based Deep Learning Models for finding the intent of a Query in a particular taxonomy/categories: Description and Jupyter-notebook demonstration [20mins]: o Multi-Label/Multi-Class Classification Model from scratch in Keras o Feature Engineering in Spark Scala and pandas o Keras Functional APIs details in TF 2.0 o ImageNet moment of NLP - Latest invention in Word embeddings – ELMO and BeRT o Understanding Deep Neural Networks like Bi-directional Long Short-term Memory (BiLSTM) and character embeddings for language modeling • NLP based Deep Learning Models for Query Tagging with entities like Brand, Color, Nutrition, product quantity, etc. using Named Entity Recognition: [10mins]: o Building custom model in Tensorflow Estimator API o Traditional Word Embeddings like Glove, Fasttext etc. o Query (Text) Preprocessing o Sequence modeling using Convolutional Random Fields (CRF) o Saving and Restore heavy model in TF using SavedModel concept 2. Related Searches [20mins]: [Atul Agarwal] • NLP based Deep Learning Model for predicting Next Search Keyword – Model Description and Jupyter Notebook demonstration[20mins]: o Building Sequence to Sequence (Seq2Seq) model using Long Short Term Memory (LSTM) concept of Deep Neural Network in Keras o comparing different word Embeddings e.g. word2vec, fasttext, glove, etc. in popular AI framework such as gensim. o Keras Sequential APIs details in Keras o Similarity Search based on Facebook AI Research aka FAISS.
This document discusses using a single channel EEG device to recognize emotions from EEG data. It collected data from 10 individuals labeled as stressed or relaxed. It preprocessed the raw EEG data using filters to isolate brain signals. It then used deep learning models including an LSTM network with and without attention to classify emotions. The LSTM with attention achieved 85% accuracy, which was an improvement over the LSTM without attention. Potential applications discussed include using EEG for stress reduction by customizing music, and emotion or word prediction. The document also discusses opportunities for future enhancements such as using convolutional layers or multi-modal networks incorporating additional physiological sensors.
More Related Content
Similar to Data Science Skills Study 2018 by AIM & Great Learning
An introductory but highly practical talk on starting a Data Science career and life. It touches upon all the main aspects of the path towards becoming a Data scientist, also seen through a personal development perspective. Moreover, we talk about the role that a data scientist ultimately fulfills - as an individual or as a team - in the technology innovation life cycle and the product life-cycle.
This document provides an introduction to data science, including definitions of data science, its impact and importance. It discusses how data science affects organizations and provides competitive advantages. Examples of data science applications are given across various domains like banking, healthcare, transportation and more. The document also outlines the road to becoming a data scientist and what skills are required, such as learning to code, mathematics, machine learning techniques and software engineering. In summary, data science uses scientific methods to extract knowledge and insights from data, it benefits society in areas like healthcare, transportation and environment, and becoming a data scientist requires strong coding and analytical skills.
This presentation was provided by Daniella Lowenberg of the California Digital Library during the NISO Virtual Conference, Advancing Altmetrics, held on Wednesday, December 13, 2017.
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
The document provides an overview of the Open Research Data Pilot, the data management plan, and OPENAIRE tools and services to support implementation of FAIR data management plans. It discusses the aims of the Open Research Data Pilot, which Horizon 2020 projects are required to participate, and the types of data that must be deposited. It also covers topics like creating a data management plan, selecting a repository, making data FAIR, and OPENAIRE support resources like briefing papers, webinars, and the Zenodo repository.
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
The document provides an overview of the Open Research Data Pilot, the data management plan, and OPENAIRE tools and services to support implementation of FAIR data management plans. It discusses the aims of the Open Research Data Pilot, which Horizon 2020 projects are required to participate, and the types of data that must be deposited. It also covers topics like creating a data management plan, selecting a repository, making data FAIR, and OPENAIRE support resources like briefing papers, webinars, and the Zenodo repository.
FAIR for the future: embracing all things dataARDC
FAIR for the future: embracing all things data - Natasha Simons, Keith Russell and Liz Stokes, presented at Taylor & Francis Scholarly Summits in Sydney 11 Feb 2019 and Melbourne 14 Feb 2019.
PAARL's 1st Marina G. Dayrit Lecture Series held at UP's Melchor Hall, 5F, Proctor & Gamble Audiovisual Hall, College of Engineering, on 3 March 2017, with Albert Anthony D. Gavino of Smart Communications Inc. as resource speaker on the topic "Using Big Data to Enhance Library Services"
Analytic Transformation | 2013 Loras College Business Analytics SymposiumCartegraph
The document summarizes key points from a 2013 analytics symposium. It discusses trends in big data discovery, mobility, real-time decisions, and predictive analytics. Big data allows tapping diverse data sets to find unknown relationships and make data-driven decisions. It impacts many industries. Real-time data and decisions are important as over 80% of executives say critical information is delivered too late. Predictive analytics and visualization help add meaning to data. Mobility increases access and analytical collaboration anywhere.
This document provides an overview of data science including its importance, what data scientists do, how the field has emerged, and how to become a data scientist. It notes that by 2018 the US could face shortages of people with data analytics skills. It then discusses how LinkedIn's early growth in 2006 exemplifies the data science process of framing questions, collecting and processing data, exploring patterns, and communicating results. Finally, it outlines the tools used in data science like SQL, analytics software, and machine learning and discusses getting started in the field through education, curiosity, and ongoing learning with mentorship support.
Getting started in Data Science (April 2017, Los Angeles)Thinkful
The document discusses the rise of data science and the skills needed for data scientists. It defines data science as the intersection of engineering, statistics, and communication. Data scientists analyze large datasets to answer important business questions. The document uses LinkedIn in 2006 as a case study, outlining how a data scientist there framed questions, collected and processed user data, explored patterns, and communicated results to improve the user experience and growth. It highlights tools like SQL, analytics software, and machine learning that data scientists use and stresses the importance of curiosity, technical skills, and strong communication for those interested in the field.
Speaker: Franz Walder, Product Manager, panagenda
Abstract: panagenda reached out to 750+ professionals to share their company’s Domino application strategy. Join this session to find out what was most important to your peers and what challenges they had to overcome to make their project a success. Find out about the critical questions everybody should ask and have answers to throughout their project. Franz Walder presents the exciting results of the survey and explains what role analytics can play when tackling these challenges.
This document provides an overview of data science including its importance, what data scientists do, how the field has emerged, and how to become a data scientist. It discusses how data science can help answer important business questions using LinkedIn in 2006 as a case study. It also outlines the typical data science process of framing questions, collecting and cleaning data, exploring patterns, and communicating results. Finally, it introduces some common data science tools like SQL, analytics software, and machine learning algorithms and discusses options for continuing education in data science.
panagenda reached out to 750+ professionals to share their company’s Domino application strategy. Join this session to find out what was most important to your peers and what challenges they had to overcome to make their project a success. Franz Walder presents the exciting results of the survey and explains what role analytics can play when tackling these challenges.
This document outlines a 10 step framework for developing data science applications. It begins with articulating the business problem and data questions. Next steps include developing a data acquisition and preparation strategy, exploring and formatting the data, defining the goal, and shortlisting techniques. Later steps evaluate constraints, establish evaluation criteria, fine tune algorithms, and plan for deployment and monitoring. The document also provides background on the speaker and organization. They offer data science, quant finance, and machine learning programs and consulting using Python, R, and MATLAB on their online sandbox platform.
Workshop 1. Architecting Innovative Graph Applications
Join this hands-on workshop for beginners led by Neo4j experts guiding you to systematically uncover contextual intelligence. Using a real-life dataset we will build step-by-step a graph solution; from building the graph data model to running queries and data visualization. The approach will be applicable across multiple use cases and industries.
This document discusses why Python is a popular programming language for data science. It notes that Python has a clean syntax, expansive library, and large user base. Additionally, major companies like Google use Python for various applications. The document also provides examples of what businesses use Python for, including building data pipelines, descriptive analytics, machine learning, and data science tasks like clustering and prediction. Finally, it outlines some common tools and processes used in working as a data analyst or scientist, such as cleaning, reshaping, and analyzing data in Python.
Moving Beyond Batch: Transactional Databases for Real-time DataVoltDB
Join guest Forrester speaker, Principal Analyst Mike Gualtieri, and Dennis Duckworth Director of Product Marketing from VoltDB to learn how enterprises can create a real-time, “origin-zero” data architecture within transactional databases to become a real-time enterprise.
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j
The document outlines an agenda for a Neo4j Graph Day event including sessions on connected data, graphs and artificial intelligence, a lunch break, Neo4j training, and a reception. Key topics include Neo4j in production environments, its role in boosting artificial intelligence, and training opportunities.
Keynote: Graphs in Government_Lance Walter, CMONeo4j
This document contains an agenda and presentation slides for a Neo4j Graphs in Government event. The presentation introduces graph databases and Neo4j, discusses how graphs can help solve network-oriented problems, provides examples of graph use cases in various industries, and highlights new features in Neo4j 4.0 like easy management, unlimited scaling, and granular security. Case studies demonstrate how Neo4j has helped organizations like the US Army, MITRE, Adobe, and the German Center for Diabetes Research tackle complex data challenges.
Similar to Data Science Skills Study 2018 by AIM & Great Learning (20)
The agenda of the talk has been broken down into two parts: 1. Query Understanding [30mins]: [Sonu Sharma] • NLP based Deep Learning Models for finding the intent of a Query in a particular taxonomy/categories: Description and Jupyter-notebook demonstration [20mins]: o Multi-Label/Multi-Class Classification Model from scratch in Keras o Feature Engineering in Spark Scala and pandas o Keras Functional APIs details in TF 2.0 o ImageNet moment of NLP - Latest invention in Word embeddings – ELMO and BeRT o Understanding Deep Neural Networks like Bi-directional Long Short-term Memory (BiLSTM) and character embeddings for language modeling • NLP based Deep Learning Models for Query Tagging with entities like Brand, Color, Nutrition, product quantity, etc. using Named Entity Recognition: [10mins]: o Building custom model in Tensorflow Estimator API o Traditional Word Embeddings like Glove, Fasttext etc. o Query (Text) Preprocessing o Sequence modeling using Convolutional Random Fields (CRF) o Saving and Restore heavy model in TF using SavedModel concept 2. Related Searches [20mins]: [Atul Agarwal] • NLP based Deep Learning Model for predicting Next Search Keyword – Model Description and Jupyter Notebook demonstration[20mins]: o Building Sequence to Sequence (Seq2Seq) model using Long Short Term Memory (LSTM) concept of Deep Neural Network in Keras o comparing different word Embeddings e.g. word2vec, fasttext, glove, etc. in popular AI framework such as gensim. o Keras Sequential APIs details in Keras o Similarity Search based on Facebook AI Research aka FAISS.
This document discusses using a single channel EEG device to recognize emotions from EEG data. It collected data from 10 individuals labeled as stressed or relaxed. It preprocessed the raw EEG data using filters to isolate brain signals. It then used deep learning models including an LSTM network with and without attention to classify emotions. The LSTM with attention achieved 85% accuracy, which was an improvement over the LSTM without attention. Potential applications discussed include using EEG for stress reduction by customizing music, and emotion or word prediction. The document also discusses opportunities for future enhancements such as using convolutional layers or multi-modal networks incorporating additional physiological sensors.
Flood & Other Disaster forecasting using Predictive Modelling and Artificial ...Analytics India Magazine
Over 2.3 Billion people are affected due to floods in last 20 years and causing countless death , More than 92,million cattle are lost every year, seven million hectares of land is affected, and damage is over trillions dollars when taken globally in last 5 years. Floods are complicated natural events. It depends on several parameters, so it is very difficult to model analytically. The floods in a catchment depends on the characteristics of the catchment, rainfall and antecedent conditions. So the estimation of the flood peak is a very complex problem. Its due to the lack of Flood Prediction System which can predict the situation accurately. To Overcome this challenge we are building a Flood Prediction System using Predictive modelling. However we have divided our idea into small fragments but enough to be used globally. We have considered most flooded state of India, but can be used widely for all the low lying geographical regions. •The plains of Bihar, adjoining Nepal, are drained by a number of rivers that have their catchments in the steep and geologically nascent Himalayas. Kosi, Gandak,Burhi Gandak, Bagmati, Kamla Balan, Mahananda and Adhwara Group of rivers originates in Nepal, carry high discharge and very high sediment load and drops it down in the plains of Bihar. · About 65% of catchments area of these rivers falls in Nepal/Tibet and only 35% of catchments area lies in Bihar. · Bihar is India’s most flood-prone State, with 76 percent of the population, in the north Bihar living under the recurring threat of flood devastation. About 68800 sq Km out of total geographical area of 94163 sq Km comprising 73.06 percent is flood affected. · According to some historical data, 16.5% of the total flood affected area in India is located in Bihar while 22.1% of the flood affected population in India lives in Bihar. · From 1979 to Present day more than 8,873 Humans & 27,573 animals have lost their life due to flood. Some of Tools & Technology which is being used & can be used for Flood Prediction: •IBM. Watson Studio democratizes machine learning and deep learning to accelerate infusion of AI in to drive innovation. •An Intelligent Hydro-informatics Integration Platform for Regional Flood Inundation Warning Systems. •Three-Parameter Muskingum Model Coupled with an Improved Bat Algorithm. · Deep Learning with a Long Short-Term Memory Networks Approach for Rainfall-Runoff Simulation
AI for Enterprises-The Value Paradigm By Venkat Subramanian VP Marketing at B...Analytics India Magazine
AI is here, call it buzz, cause it a bubble, we are smack in the middle of an AI revolution. While there is a strong view building about consumer AI applications, there still seems to be some scepticism about AI for enterprises, primarily due to the lack of clarity and focus on how AI can actually deliver value for enterprises. At BRIDGEi2i, we believe it is important to have a non-fragmented view of the AI ecosystem and a “Value Roadmap” for AI in the enterprise context. As CxOs, it is important to understand where the enterprise is in the transformation journey and define value accordingly. This talk will throw light on how to look at the enterprise AI ecosystem and build the right roadmap for value.
Keep it simple and it works - Simplicity and sticking to fundamentals in the ...Analytics India Magazine
With the buzz around AI and ML there is an increasing tendency for leaders and data scientists to move towards complex problem-solving. Its important to unlearn the tendency to gravitate towards complexity. In this talk we will see why avoiding complexity in ML solutions is a wiser and a quicker way to solve business problems. We can also visit some thumb rules to build pragmatic and useful models. Simplicity and sticking to fundamentals is the next "big" thing in the world of big data.
Feature Based Opinion Mining By Gourab Nath Core Faculty – Data Science at Pr...Analytics India Magazine
Suppose that a customer who has given a high rating about a mobile phone writes the following review about the product: The front camera of the phone is excellent! Truly speaking, this is the best front camera I have experienced so far. From this review, we can understand two things. First, the customer holds a positive opinion about the phone. Secondly, the front camera of the phone is the targeted feature on which the opinions have been expressed in the review. In this workshop, we will be particularly interested in discovering patterns as indicated in the second case. We will discuss a framework that enables us to first discover the targets on which the opinions have been expressed in a review and then determine the polarity of the opinions. This kind of detailed analysis helps us to discover the components or features of the products which the customers have liked or disliked and thus help us to better summarize the information.
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Analytics India Magazine
Most organizations understand the predictive power and the potential gains from AIML, but AI and ML are still now a black box technology for them. While deep learning and neural networks can provide excellent inputs to businesses, leaders are challenged to use them because of the complete blind faith required to ‘trust’ AI. In this talk we will use the latest technological developments from researchers, the US defense department, and the industry to unbox the black box and provide businesses a clear understanding of the policy levers that they can pull, why, and by how much, to make effective decisions?
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...Analytics India Magazine
Getting your first job in Data Science is difficult. You’ve been applying to jobs, but they keep rejecting you. You don’t know what to do and how you could differentiate yourselves amidst the pool of candidates? In this talk, we’ll be going through different tips and techniques you could use to find that elusive Data Science jobs. They’ve worked for me and probably will work for you too!
With enterprises putting digital at the core of their transformation, our annual Data Science & AI Trends Report explores the key strategic shifts enterprises will make to stay intelligent and agile going into 2019. The year was marked by a series of technological advances, including advances in AI, deep learning, machine learning, hybrid cloud architecture, edge computing (with data moving away to edge data centres), robotic process automation, a spurt of virtual assistants, advancements in autonomous tech and IoT.
Everyone is talking about Artificial Intelligence — the new normal, which has entered almost every work process across industries. Enterprises are rethinking and strengthening their AI capabilities, using it as a tool to improve products and services. With AI becoming crucial to enterprise success, upskilling has become the new mantra among Indian IT professionals, who are keen to make an impact in their careers with Machine Learning and AI.
In our annual AI Study with Great Learning, we take a look at key AI trends dominating the Indian AI market-leading companies, professionals, salaries, jobs broken down by cities and how AI’s potential for industry growth has risen over the last few years. In the second half of the study, we cover AI literacy in India through Great Learning’s comprehensive AI/ML programs that are bridging the current skill gap and consequently boosting workforce transitions.
Emerging engineering issues for building large scale AI systems By Srinivas P...Analytics India Magazine
The document discusses an online 6-month certificate program in artificial intelligence and deep learning from Manipal Prolearn. It provides awarding from MAHE, hands-on training using real-world data from different domains, and instruction from industry experts. The program teaches skills for developing end-to-end AI/ML systems and covers topics like data acquisition, modeling, evaluation, and deployment.
Predicting outcome of legal case using machine learning algorithms By Ankita ...Analytics India Magazine
This document summarizes a presentation on predicting the outcomes of legal cases using machine learning models. It discusses extracting data from judgment documents and identifying key features for analysis. Exploratory data analysis was conducted on 202 observations to understand patterns. Logistic regression, KNN, random forest, and support vector machine models were developed. The tuned support vector machine model achieved the highest accuracy of 95% based on 10-fold cross-validation. Overall, support vector machine provided the best performance. The models tended to predict non-guilty outcomes more frequently due to the skewed data. Future work involves developing a mobile application for these predictive capabilities.
Bringing AI into the Enterprise - A Practitioner's view By Piyush Chowhan CIO...Analytics India Magazine
Artificial Intelligence is slowly getting its way into organisations. AI journey can be quite complex and a proper roadmap would be needed to realize the benefits. This presentation will talk about common myth and a high level approach for bringing AI into the enterprise.
www.analyticsindiasummit.com
Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...Analytics India Magazine
We started relying on the decisions made by deep learning models, however why it works and how it works are still big questions for most of us. We shall try to open that black box of deep learning which is essential to build trust for wide spread adoption. The speaker shall address the importance of feature visualization and localization in deep learning models esp. convolutional neural networks. He shares the results of applying methods such as activation map, deconvolution and Grad-CAM in healthcare.
Getting started with text mining By Mathangi Sri Head of Data Science at Phon...Analytics India Magazine
The workshop will empower you and get started with analyzing text data, discover patterns and what are the best ways to convert unstructured to structured data. We will also build a quick classification model and understand techniques to improve model performance. Towards the end lets quickly do a sentiment analysis on data corpus and discuss the next steps to improve model accuracy. Please come prepared with a working laptop with Jupyter Notebook and Python 2.7. Participants who have a minimum working knowledge of supervised models is encouraged.
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...Analytics India Magazine
The “Sexiest job of the 21st century” is often surveyed to be poorly defined, intermittently satisfying and vaguely understood in most board rooms. As success stories are widely publicized, senior business leaders’ expectations from analytics are rising quickly. And the field itself is changing rapidly - with speciality skills becoming self-service in no time. In that context, the talk explores how the various analytics roles across the spectrum are changing. And what it takes for analytics professionals to stay relevant, contribute meaningfully to business results and play a critical role in shaping business strategy.
www.analyticsindiasummit.com
"Route risks using driving data on road segments" By Jayanta Kumar Pal Staff ...Analytics India Magazine
Going out for dinner in Mumbai during an extended stay, or planning for a long road-trip across the wild west of Rajasthan, the first thing one looks at is Maps, that informs the relative distance, estimated time and congestion areas of different routes for the drive. Zendrive built state-of-the-art technologies on its huge cache of driving data from smartphones and OBD, to add a significant dimension to the route mapping of Google, that is safety risk of the route. Essentially the technology is built on millions of drivers zipping through the route or segments thereof. Automobile Insurance expands in UBI- where it has been established that tracking a driver’s behavior behind the wheels (like Hard Brake, Speeding etc) can predict significant differences between their chances of collisions. Looking at the same event data from the road perspective, aggregating the relative event density on road stretches also predict the relative chances of collision on that segment. We have used map matching using GIS techniques, parametric density estimation and rare event modeling using quasi-Poisson GLM to analyze our data, build the models and finally implement the scoring system across the GIS route maps. Key learnings : Relation between dangerous driving events and collisions Route risk as an aggregate of all the drivers (or sample thereof) and their driving risk. The route you take for commute may determine your auto insurance. Outline : Usage Based Insurance : relation between collision rates and dangerous driving. Driving events : aggressive acceleration, hard brake, speeding, phone use, aggressive turns Poisson GLM modeling to predict collision rates using driving data Events on a road segment : map-matching using GIS techniques to split trips along road stretches, and aggregate such events along the spatio-temporal dimension across all drivers. Route risk of the road segment and any route comprising such segments. Driving risk along such routes and corresponding collision risks using transfer of the GLM model. Assignment of risks to drivers on their daily route of commute, to be used in UBI.
www.analyticsindiasummit.com
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...Analytics India Magazine
By Phani Mitra VP Analytics & Strategy at Dr. Reddy’s
The “Sexiest job of the 21st century” is often surveyed to be poorly defined, intermittently satisfying and vaguely understood in most board rooms. As success stories are widely publicized, senior business leaders’ expectations from analytics are rising quickly. And the field itself is changing rapidly - with speciality skills becoming self-service in no time. In that context, the talk explores how the various analytics roles across the spectrum are changing. And what it takes for analytics professionals to stay relevant, contribute meaningfully to business results and play a critical role in shaping business strategy.
www.analyticsindiasummit.com
The analytics education market in India is exploding with analytics institutes providing a slew of instructional courses that are in line with the industry demand. This study aims to provide an overview of the analytics education landscape in India, the type of learning models offered and how online learning has become an inherent part of the analytics ecosystem in India.
Analytics & Data Science Industry In India: Study 2018 - by AnalytixLabs & AIMAnalytics India Magazine
The data analytics market in India is growing at a fast pace, with companies and startups offering analytics services and products catering to various industries. Different sectors have seen different penetration and adoption of analytics, and so is the revenue generation from these sectors.
The Analytics and Data Science Industry Study 2018 takes into account various trends that analytics industry in India is witnessing, revenue generated through various geographies, analytics market size by sector, across cities etc. It also takes into consideration analytics professionals in India across work experience and education.
This year’s study is brought to you in association with AnalytixLabs, a pioneer and one of the first analytics training institutes in India. The study is a result of extensive primary and secondary research conducted over a duration of two months, where we got in touch with analytics companies and professionals across various industries such as banking, finance, ecommerce, retail, pharma, healthcare and others.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
06-20-2024-AI Camp Meetup-Unstructured Data and Vector DatabasesTimothy Spann
Tech Talk: Unstructured Data and Vector Databases
Speaker: Tim Spann (Zilliz)
Abstract: In this session, I will discuss the unstructured data and the world of vector databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture. Wrapping up with an overview of Milvus.
Introduction
Unstructured data, vector databases, traditional databases, similarity search
Vectors
Where, What, How, Why Vectors? We’ll cover a Vector Database Architecture
Introducing Milvus
What drives Milvus' Emergence as the most widely adopted vector database
Hi Unstructured Data Friends!
I hope this video had all the unstructured data processing, AI and Vector Database demo you needed for now. If not, there’s a ton more linked below.
My source code is available here
https://github.com/tspannhw/
Let me know in the comments if you liked what you saw, how I can improve and what should I show next? Thanks, hope to see you soon at a Meetup in Princeton, Philadelphia, New York City or here in the Youtube Matrix.
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/141-10June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
https://www.meetup.com/unstructured-data-meetup-new-york/events/301383476/?slug=unstructured-data-meetup-new-york&eventId=301383476
https://www.aicamp.ai/event/eventdetails/W2024062014
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
2. 2 Data Science Skills
Study 2018
04
INTRODUCTION
06
WHICH LANGUAGE
DO DATA
SCIENTISTS PREFER
FOR STATISTICAL
MODELLING?
08
WHICH DATA
SCIENCE METHODS
ARE THE MOST
POPULAR AT
WORK?
10
WHICH IS THE
MOST POPULAR
PYTHON GENERAL
PURPOSE LIBRARY?
12
WHIC
DATA
PREF
14
WHIC
DASH
VISU
TOOL
SCIE
16
WHIC
PROV
SCIE
18
WHA
OF LE
RESO
DATA
USE
THEM
UPDA
3. 3By AIM &
Great Learning
2
CH TOOLS DO
A SCIENTISTS
FER?
4
CH
HBOARD/
UALISATION
LS DO DATA
NTISTS PREFER?
6
CH CLOUD
VIDER DO DATA
NTISTS PREFER?
8
AT KIND
EARNING
OURCES DO
A SCIENTISTS
TO KEEP
MSELVES
ATED?
20
WHERE DO DATA
SCIENTISTS FIND
OPEN DATA?
22
WHICH OS DO MOST
DATA SCIENTISTS
USE AT WORK?
24
PREFERRED
DEVELOPMENT
ENVIRONMENT
26
HOW IS CODE
SHARED AT YOUR
WORKPLACE?
CONTENTS
28
WHAT IS THE
NEURAL NETWORK
ARCHITECTURE THAT
DATA SCIENTISTS USE
MOST FREQUENTLY?
30
WHICH BIG DATA TOOL
HAVE YOU USED THE
MOST?
32
WHICH GPUs DO DATA
SCIENTISTS USE AT
WORK?
34
OUR RESPONDENTS
36
CONCLUSION
4. Data Science Skills
Study 2018
INTRODUCTION
Data Science is an emerging
field which is now being
integrated with industries across
all sectors. This year Analytics
India Magazine, in association
with Great Learning, decided to
find out what goes on behind the
making of a good Data Scientist.
We spent a lot of time finding
out the tools and techniques
used by these new technology
professionals.
From language to coding and
GPUs,wegarneredinteresting
and insightful answers from our
comprehensive survey.
About The Study:
We tried to look at variations
in technology, tools, work
experience and educational
qualifications in this survey. We
took opinions from all those who
practice data science — from
professionals with less than two
years of experience to CXOs
— to get a thorough idea of the
working environment in this
growing field.
Our survey was met with much
enthusiasm — and we got some
great insights from it. Some of
them were expected, and many
of them were real eye-openers.
Without further ado, let’s take a
deep dive into the study!
4
5. By AIM &
Great Learning
Disclaimer: This document is
the result of continued research
by Analytics India Magazine and
Great Learning.Permission may
be required from at least one of
the parties for reproduction of
the information on this report.
All rights are reserved with the
aforementioned parties.
This study is a result of extensive
primaryandsecondaryresearch,
carried out over a period of one
monthbyAnalyticsIndiaMagazine,
inassociationwithGreatLearning.
The research methodology
included a systematic plan
to identify various factors that
influenced data scientists to
use a particular set of tools and
techniques in their professional
work. The data was collected
by sending survey questions to
readers, professionals from the
community, students and others,
across all major cities in India.
5
6. WHICH LANGUAGE
DO DATA SCIENTISTS
PREFER FOR STATISTICAL
MODELLING?
• The favourite language for data
scientists in today’s era is Python,
used by almost 44% professionals
• A close second is R at 35% —
another clear favourite with data
scientists, due to its versatility
• SQL (6%) and SAS (7%) claim only
a minor share of the attention of
data scientists
6
44%professionals use Python
for Data Science
Data Science Skills
Study 2018
7. 7By AIM &
Great Learning
Python
44%
R
35%
SAS
7%
SQL
6%
Other
5%
Matlab
3%
8. Data Science Skills
Study 2018
WHICH DATA SCIENCE
METHODS ARE THE
MOST POPULAR AT
WORK?
Inthissection,weaskeddatascientists
to pick out the most frequently-used
statistical method.
• 72% scientists answered that they
used Logistic Regression the most
at work
• This was followed by Decision
Trees at 56% and Neural Network
at 48%
8 Data Science Skills
Study 2018
9. 9By AIM &
Great Learning
scientists use
Logistic Regression
the most72%
LogisticReg
DecisionTree
RandomForest
NeuralNetworks
BayesianTech
EnsembleMethods
SVMS
GBM
CNNS
RNNS
EvolutionaryApp
MarkovLogic
HMMs
0
50
100
150
No.ofresponses
MOST POPULAR DATA SCIENCE METHOD AT WORK
10. 10 Data Science Skills
Study 2018
WHICH IS THE MOST
POPULAR PYTHON
GENERAL PURPOSE
LIBRARY?
Python is one of the largest
programming communities in the
world. There are plenty of libraries
which a data scientist can use to
analyze large amounts of data. But
here are our readers’ favourites:
• Pandas emerged as a clear choice
for most data scientists at almost
41%
• Numpy was the second-favourite
at 24%
• Sklearn and MatPlotLib followed at
17% and 14% respectively
10
41%
of data scientists use
Pandas as their preferred
Python library
Data Science Skills
Study 2018
11. By AIM &
Great Learning
Pandas
41%
Numpy
24%
Sklearn
17%
MatPlotLib
14%
Other
4%
MOST POPULAR PYTHON GENERAL PURPOSE LIBRARY
12. With a plethora of data analytics
tools available online, we asked data
scientists if they were willing to use
open sourced tools at work. The
answer was a resounding yes.
• Almost 89% of the data scientists
said that they preferred to work
with open sourced tools
• Only 8% data scientists said that
they liked to work with custom-
made tools which are tweaked and
personalised for their particular
projects
89%data scientists work with
open sourced tools
12
WHICH TOOLS
DO DATA SCIENTISTS
PREFER?
Data Science Skills
Study 2018
13. 13By AIM &
Great Learning
Open Source
89%
Custom Made
8%
Paid
3%
WHAT KINDS OF TOOLS DO YOU PREFER?
14. 51%data scientists use Tableau
14
WHICH DASHBOARD/
VISUALISATION TOOLS
DO DATA SCIENTISTS
PREFER?
Data visualisation might be tricky
for many data scientists. Crunching
numbers is one thing, but telling a story
with numbers is a whole different deal.
When we asked about this to our readers
they had one clear winner:
• More than half the respondents, 51%,
said that they preferred to use Tableau
as a dashboard or visualization tool.
15. 15By AIM &
Great Learning
Tableau
51%
Microsoft BI
12%
Others
12%
IBM Analytics
11%
Qlikview
8%
SAP Analytics
6%
MOST FAVORITE DASHBOARD/VISUALISATION TOOL
15By AIM &
Great Learning
16. Information flow is a part of data
science. While data usage and storage
are important, security and privacy of
the data are also key to the job.
• Amazon Web Services is a clear
winner here with over 45% of the
votes
• Google Cloud is the second
favourite with over almost 34%
votes
AWSis the most preferred cloud
provider
16
WHICH CLOUD
PROVIDER DO DATA
SCIENTISTS PREFER?
Data Science Skills
Study 2018
17. 17By AIM &
Great Learning
AWS
45%
Google Cloud
34%
Microsoft Azure
18%
Other 3
3%
MOST PREFERRED CLOUD PROVIDER
Others
18. Analytics Jobs
Study 2018
Due to ever-changing technology
it is vital for data scientists to keep
themselves updated. And they seem
to have found out an interesting way
to do so!
• 76% of our readers said that they
liked watching tutorials and videos
on YouTube
• Almost 54% of the data scientists
said that they like learning the old-
school way — through books and
e-books
• 46% of respondents also look
at MOOCs as a way to upskill
themselves
18
WHAT KIND
OF LEARNING
RESOURCES DO DATA
SCIENTISTS USE TO
KEEP THEMSELVES
UPDATED?
Data Science Skills
Study 2018
19. 19By AIM &
Great Learning
76%professionals learn from
YouTube videos
MOOCs
YouTubeVideos
Books/eBooks
Courses
Podcasts
OnlineCommunities
Arxiv
Kaggle
SocialMedia
Tutoring
0
50
100
150
No.ofresponses
Conferences
WHAT KIND OF LEARNING RESOURCES DO YOU USE TO KEEP
YOURSELF UPDATED?
19By AIM &
Great Learning
20. 27%respondents use GitHub
20
WHERE DO
DATA SCIENTISTS
FIND OPEN DATA?
Finding open data is not that hard,
but getting clean open data is often
a trying experience. No data scientist
wants to waste their time cleaning it.
There were four clear popular options
here:
• 27% respondents use GitHub
• 22% readers used university
websites and the data uploaded
by them for research
• 20% data scientists also use
data publicly uploaded on official
government websites
• 15% of the respondents source
their data manually
21. 21By AIM &
Great Learning
WHERE DO YOU FIND OPEN DATA?
Github
27%
Uni Websites
22%
Public Govt Data
20%
Scrapping
15%
NGOs
7%
Reddit
5%
Others
4%
Scraping
Public Govt. Data
22. Analytics Jobs
Study 2018
The operating system which data
scientists use plays a crucial role.
Compatibility with their tools and ease
of use are two key factors. For this
question, the respondents had a liking
for one OS:
• Almost 69% of data scientists use
Windows OS
• 24% prefer Linux
• And only 7% prefer macOS.
22
WHICH OS DO MOST
DATA SCIENTISTS USE
AT WORK?
Data Science Skills
Study 2018
23. 23By AIM &
Great Learning
Windows
69%
Linux
24%
MacOS
7%
69%of data scientists use
Windows OS
WHICH OS DO YOU USE AT WORK?
23By AIM &
Great Learning
macOS
24. An integrated development
environment (IDE) is very important
to set up and streamline data science
processes. Our respondents chose
the following options from the tools
presented to them:
• Almost 38% prefer using RStudio
• And close to 37% data scientists
like using Notebook
38%prefer using RStudio
24
PREFERRED
DEVELOPMENT
ENVIRONMENT
Data Science Skills
Study 2018
25. 25By AIM &
Great Learning
PREFERRED DEVELOPMENT ENVIRONMENT
RStudio
38%
Notebook
37%
PyCharm
14%
Idle
6%
Others
5%
26. 45%use Git to share codes at
workplaces
26
HOW IS CODE
SHARED AT
YOUR WORKPLACE?
Like we said earlier, privacy,
operational efficiency and security
are of paramount importance in any
organization that deals with data.
Here’s what we found out:
• Over 45% of the respondents use
Git to share codes at workplaces
• 28% of the data scientists said
that their organizations use cloud-
based programs to share codes
• And 24% of our readers shared
codes over non-cloud based
programs
27. 27By AIM &
Great Learning
HOW IS CODE SHARED AT YOUR WORKPLACE?
Git
45%
Cloud-Based Prog
28%
Non-Cloud Based
24%
Others
3%
.
28. 28
WHAT IS THE
NEURAL NETWORK
ARCHITECTURE THAT
DATA SCIENTISTS USE
MOST FREQUENTLY?
Neural networks are a crucial part
of programming as well as data
science. We got a clear picture
that data scientists, as well as
their organizations, use a variety of
architectures. According to our study,
convolutional neural network was the
most frequently used NN at 33%.
Data Science Skills
Study 2018
28
29. 29By AIM &
Great Learning
CNN
33%
Feedforward NN
25%
RNN
20%
Moduler NN
14%
Radial Basis NN
5%
WHICH NEURAL NETWORK ARCHITECTURE DO YOU USE
MOST FREQUENTLY?
CNN is the most popular
neural network at
33%
GAN
3%
30. From open source tools to paid or
customized ones, many professionals
prefer different tools based on the
projects or the organization they are
working for. Data scientists from our
survey rated their most-favoured big
data tools in the following order:
• 52% of the users said they used
Hadoop the most
• Almost 22% data scientists used
NoSQL
52%respondents favourite Big
Data tool is Hadoop
30
WHICH BIG DATA
TOOL HAVE YOU
USED THE MOST?
Data Science Skills
Study 2018
31. 31By AIM &
Great Learning
WHICH BIG DATA TOOL HAVE YOU USED THE MOST?
Hadoop
52%
NoSQL
22%
Paid/Customised
12%
Hive
10%
Polybase
3%
Presto
1%
32. Analytics Jobs
Study 2018
Over 19% of our respondents said
that they preferred using the Nvidia
GeForce GTX 8 Series for intensive
data usage. The GTX 8 series model
is a middle-level GPU — multipurpose
and flexible.
34%use low-end GPU models
for intensive data usage
32
WHICH GPUs DO
DATA SCIENTISTS USE
AT WORK?
Data Science Skills
Study 2018
33. 33By AIM &
Great Learning
Lower-End Models
34%
GTX 8 Series
19%
GTX 10 Series
16%
High-End Models
12%
GTX 9 Series
8%
Tesla K Series
7%
WHICH GPU DO YOU USE AT WORK?
33By AIM &
Great Learning
Tesla K Series
7%
Tesla P Series
4%
34. 34
OUR RESPONDENTS’
PROFILE:
WHICH INDUSTRY DO MOST DATA
SCIENTISTS BELONG TO?
IT
38%
Others
24%
BFSI
10%
Manufacturing
9%
Healthcare
8%
Ecommerce
5%
Retail
3%
48.6%
22.1%
10.6%9.9%
8.8%
Other
31%
Bangaluru
27%
Mumbai
13%
Hyderabad
12%
Delhi/NCR
10%
Chennai
7%
WORK EXPERIENCE:
0-2 YEARS
HIGHEST FORMAL EDUCATION: CITY OF WORK AND RESIDENCE:
2-5 YEARS
5-10 YEARS
10-15 YEARS
15 YEARS AND MORE
Data Science Skills
Study 2018
PG/Master's
49%
Graduation
28%
Undergraduate
20%
PhD
3%
Customer Service
3%
Bengaluru
35. 35By AIM &
Great Learning
37.5%respondents are from the IT
background
36. Data Science Skills
Study 2018
CONCLUSION
A
s the Analytics industry
grows at the rate of
33.5% CAGR, more
professionals are expected to
segue into the Data Science
and Analytics sector. We
realised that apart from hard
work and dedication, the
tools and skillsets also play
a key role in the success of
data scientists. Some of the
eye-opening inferences were
that Python is still the all-
time favourite programming
language preferred in the
Analytics and Data Science
sector. The most popular Data
Visualisation tool used in this
industry right now is Tableau.
Another interesting aspect that
we found was professionals
were aware of the importance
of upskilling themselves and
how willing they were to do so.
Most working professionals like
to keep themselves updated by
watching videos and reading
books. Overall, the study
reveals a positive picture of
the Indian Analytics and Data
Science sector.
36 Data Science Skills
Study 2018
38. 38 Data Science Skills
Study 2018
RESEARCH
METHODOLOGY
T
he samples were
collected by asking
respondents to fill in a
survey created by Analytics
India Magazine about what
tools and techniques data
scientists use at work. This
included various sub-topics
such as data visualisation
tools, preferred operating
systems and programming
languages, among others. We
took opinions from all those
who practice data science —
from professionals with less
than two years of experience to
CXOs — to get a thorough idea
of the working environment in
this growing field.
Data Science Skills
Study 2018
38
40. 40 Data Science Skills
Study 2018
ABOUT
ANALYTICS INDIA MAGAZINE
Founded in 2012, Analytics India
Magazine has since been
dedicated to passionately
championing and promoting the
analytics ecosystem in India.
It chronicles the technological
progress in the space of analytics,
artificial intelligence, data
science, big data by highlighting
the innovations, players in the
field, challenges shaping the
future, through the promotion
and discussion of ideas and
thoughts by smart, ardent, ,
action-oriented individuals who
want to change the world.
AnalyticsIndiaMagazinehasbeen
a pre-eminent source of news,
information and analysis for the
Indian Analytics ecosystem by
covering opinions, analysis and
insights on the key breakthroughs
and developments in data-
driven technologies as well as
highlighting how they are being
leveraged for future impact.
With a dedicated editorial
staff and a network of more
than 250 expert contributors,
AIM’s stories are targeted at
futurists, AI researchers, Data
science entrepreneurs, analytics
aficionados and technophiles.
Data Science Skills
Study 2018
40
41. 41By AIM &
Great Learning
ABOUT
GREAT LEARNING
Great Learning is an ed-tech
company that offers programs in
career critical competencies such
asAnalytics,DataScience,Machine
Learning, Artificial Intelligence,
Cloud Computing and Deep
Learning. Our programs are taken
by thousands of professionals
every year who build competencies
in these emerging areas to secure
and grow their careers.
We are on a mission to make
professionals proficient and future
ready. We believe learning a new
skill is tough and high-quality
education has to be rigorous. In
addition to all our programs being
extremely comprehensive, a core
part of the learning experience is
the learning assistance provided
to candidates. We use technology,
content and a wide network of
industry experts (Great Learning
Gurus) to help candidates learn in
the most impactful manner, whether
it be through our unique blended
model of classroom sessions and
online content or online content with
personalized weekend mentorship
sessions.
Impact
• Great Learning is among the top
5 Ed-Tech startups in India in
terms of revenue and scale
• Over 5000+ professionals have
taken Great Learning programs
and we have delivered 3.5+
million hours of learning
• We have a network of 500+ Great
Learning Gurus, all of whom are
industry experts engaged in
teaching, guiding and mentoring
our candidates through our
programs
• Our Analytics program has been
ranked as India’s no.1 program
for 2015, 2016, 2017 & 2018
• We have learning centres
establishedin6cities:Bangalore,
Chennai, Hyderabad, Gurgaon,
Pune and Mumbai
Great Learning Programs At A
Glance
1. PGP BABI:
The Great Learning PG Program
in Business Analytics & Business
41By AIM &
Great Learning
42. 42 Data Science Skills
Study 2018
Intelligence is a 12-month program
that builds candidates’ Analytical
and management capabilities
through a structured learning
framework, preparing them for
business and functional roles in the
Analytics industry.
PGP-BABI is a 12-month program
offered in two formats:
• A Blended format with weekend
classroom sessions and online
learning.
• Online content with personalized
weekend mentorship sessions
The classroom sessions are
assisted by online webinars,
discussions and assignments that
keep your learning continuous and
cumulative.
2. BACP (Online):
The Great Learning Business
Analytics Certificate Program
is India’s 1st mentorship-driven
online program that runs for
6 months. Students attend
interactive sessions with program
mentors in small cohorts. Learning
sessions are supported by industry
interactions, webinars and hands-
on projects.
3. PGP DSE :
The Great Learning PG Program in
Data Science and Engineering is a
5-month program for early career
professionals looking to expedite
their move to roles such as Business
Analysts, Data Analysts, Data
Engineer, Analytics Engineers etc.
by learning relevant Data Science
techniques, tools and technologies
and hands-on application through
industry case studies. The program
is offered in a boot camp format with
16 weeks of classrooms sessions
and 4 weeks of project sessions.
4. PGP AIML:
The Great Learning PG Program
in Artificial Intelligence & Machine
Learning is designed to develop
competence in AI and ML for future-
oriented working professionals.
PGP-AIML is a 12-month program
offered in two formats:
• A Blended format with weekend
classroom sessions and online
learning.
• Online content with personalized
weekend mentorship sessions
Both these formats are designed
to suit the tight schedules of busy
Data Science Skills
Study 2018
42
43. 43By AIM &
Great Learning
working professionals. Learning
sessions are complemented by
Hackathons, Labs and 12 projects
including a Capstone project. All
project submissions are made on
Github, ensuring that learners can
showcase their entire body of work
upon completion of the program.
5. PGP CC:
The PG Program in Cloud
Computing is a 6-month online
program that includes online
and live virtual classes. IT
professionalsnotalreadyworking
with Cloud technologies will gain
a solid foundation while those
with some Cloud experience
will gain a more structured and
hands-onunderstandingofCloud
technologies, including issues
such as migration, deployment,
integration, platform choice,
architecture and TCO. The
program will help you become
proficient in working with a range
of Cloud environments.
6. PGP-ML:
Great Learning’s PG Program in
Machine Learning is a 7-month
comprehensive program
(available in both classroom as
well as online formats) that gives
learners a solid grounding in
Machine Learning technologies
and methodologies. The program
is co-created by Great Lakes
faculty and industry professionals
and includes video lectures,
well-defined projects and class
assignments to get a jump-start
in this buzzing field. Learning
sessions are complemented with
hackathons, 8 hands-on projects
and 1 Capstone project.
7. DLCP:
The Deep Learning Certificate
Program is a 3-month structured,
online program with hands-on
projects and learning support,
all designed to help one become
proficient in Deep Learning.
Students learn through a
combination of world-class
online content, industry sessions
and a series of projects.
For more information on programs,
visit www.greatlearning.in.
43By AIM &
Great Learning
44. 44 Data Science Skills
Study 2018
Data Science Skills
Study 2018
44