Data is all around us and computers play a crucial role in processing and storing it. The internet has increased the role of computers as data handling devices, and we use them more for communication and data processing than actual computations. Examples of data in everyday life include text, phone numbers, and time displayed on watches.
Data Science is defined as a scientific field that uses scientific methods to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.
important aspects of data science:
The main goal of data science is to extract knowledge from data, in other words - to understand data, find some hidden relationships and build a model.
Data science uses scientific methods, such as probability and statistics.
Obtained knowledge should be applied to produce some Información procesable, i.e. practical insights that you can apply to real business situations.
We should be able to operate on both structured and unstructured data.
Application domain is an important concept, and data scientists often need at least some degree of expertise in the problem domain, for example: finance, medicine, marketing, etc.
As we have already mentioned, data is everywhere. We just need to capture it in the right way! It is useful to distinguish between structured and unstructured data. The former is typically represented in some well-structured form, often as a table or number of tables, while the latter is just a collection of files. Sometimes we can also talk about semi-structured data, that have some sort of a structure that may vary greatly.
List of people with their phone numbers
Temperature in all rooms of a building at every minute for the last 20 years
Data for age and gender of all people entering the building
Where to get data: Internet of Things, Surveys, Analysis of behavior
Wikipedia pages with links
Collection of scientific papers in JSON format with authors, data of publication, and abstract
Where to get data: Text, Imagenes o Video, Logs
Text of Encyclopedia Britannica
File share with corporate documents
Raw video feed from surveillance camera
Where to get data: Social Network graphs
Aplicaciones de los datos:
Adquisición de datos
Almacenamiento de datos
Procesamiento de datos
Entrenamiento de modelo predictive
The Cloud, or Cloud Computing, is the delivery of a wide range of pay-as-you-go computing services hosted on an infrastructure over the internet. Services include solutions such as storage, databases, networking, software, analytics, and intelligent services.
We usually differentiate the Public, Private and Hybrid clouds as follows:
Public cloud: a public cloud is owned and operated by a third-party cloud service provider which delivers its computing resources over the Internet to the public.
Private cloud: refers to cloud computing resources used exclusively by a single business or organization, with services and an infrastructure maintained on a private network.
Hybrid cloud: the hybrid cloud is a system that combines public and private clouds. Users opt for an on-premises datacenter, while allowing data and applications to be run on one or more public clouds.
Most cloud computing services fall into three categories: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS).
Infrastructure as a Service (IaaS): users rent an IT infrastructure such as servers and virtual machines (VMs), storage, networks, operating systems
Platform as a Service (PaaS): users rent an environment for developing, testing, delivering, and managing software applications. Users don’t need to worry about setting up or managing the underlying infrastructure of servers, storage, network, and databases needed for development.
Software as a Service (SaaS): users get access to software applications over the Internet, on demand and typically on a subscription basis. Users don’t need to worry about hosting and managing the software application, the underlying infrastructure or the maintenance, like software upgrades and security patching.
Some of the largest Cloud providers are Amazon Web Services, Google Cloud Platform and Microsoft Azure.
Why Choose the Cloud for Data Science?
Developers and IT professionals chose to work with the Cloud for many reasons, including the following:
Innovation: you can power your applications by integrating innovative services created by Cloud providers directly into your apps.
Flexibility: you only pay for the services that you need and can choose from a wide range of services. You typically pay as you go and adapt your services according to your evolving needs.
Budget: you don’t need to make initial investments to purchase hardware and software, set up and run on-site datacenters and you can just pay for what you use.
Scalability: your resources can scale according to the needs of your project, which means that your apps can use more or less computing power, storage and bandwidth, by adapting to external factors at any given time.
Productivity: you can focus on your business rather than spending time on tasks that can be managed by someone else, such as managing datacenters.
Reliability: Cloud Computing offers several ways to continuously back up your data and you can set up disaster recovery plans to keep your business and services going, even in times of crisis.
Security: you can benefit from policies, technologies and controls that strengthen the security of your project.
Examples of Data Science in the Cloud
social media sentiment analysis in real time.
Let's say you run a news media website and you want to leverage live data to understand what content your readers could be interested in. To know more about that, you can build a program that performs real-time sentiment analysis of data from Twitter publications, on topics that are relevant to your readers.
The Azure cloud platform is more than 200 products and cloud services designed to help you bring new solutions to life. Data scientists expend a lot of effort exploring and pre-processing data, and trying various types of model-training algorithms to produce accurate models. These tasks are time consuming, and often make inefficient use of expensive compute hardware.
Azure ML is a cloud-based platform for building and operating machine learning solutions in Azure. It includes a wide range of features and capabilities that help data scientists prepare data, train models, publish predictive services, and monitor their usage. Most importantly, it helps them to increase their efficiency by automating many of the time-consuming tasks associated with training models; and it enables them to use cloud-based compute resources that scale effectively, to handle large volumes of data while incurring costs only when actually used.
Azure ML provides all the tools developers and data scientists need for their machine learning workflows. These include:
Azure Machine Learning Studio: it is a web portal in Azure Machine Learning for low-code and no-code options for model training, deployment, automation, tracking and asset management. The studio integrates with the Azure Machine Learning SDK for a seamless experience.
Jupyter Notebooks: quickly prototype and test ML models.
Automated machine learning UI (AutoML) : automates iterative tasks of machine learning model development, allowing to build ML models with high scale, efficiency, and productivity, all while sustaining model quality.
Azure Machine Learning Designer: allows to drag-n-drop modules to build experiments and then deploy pipelines in a low-code environment.
Data scientists and AI developers use the Azure Machine Learning SDK to build and run machine learning workflows with the Azure Machine Learning service. You can interact with the service in any Python environment, including Jupyter Notebooks, Visual Studio Code, or your favorite Python IDE.
Key areas of the SDK include:
Explore, prepare and manage the lifecycle of your datasets used in machine learning experiments.
Manage cloud resources for monitoring, logging, and organizing your machine learning experiments.
Train models either locally or by using cloud resources, including GPU-accelerated model training.
Use automated machine learning, which accepts configuration parameters and training data. It automatically iterates through algorithms and hyperparameter settings to find the best model for running predictions.
Deploy web services to convert your trained models into RESTful services that can be consumed in any application.