Linux/Unix is an operating system that supports multitasking and multi-user functionality. It consists of a kernel, shell, and programs. Unix is widely used on servers, desktops, and embedded in other operating systems. Docker is a tool that allows users to package applications into containers that can run on any infrastructure. It provides a way to deploy applications easily and consistently from development to production. Docker uses a client-server architecture, with a Docker daemon managing containers and images based on requests from a Docker client.
3. Why Data Egineers need to learn networking
Data Engineers often accesses servers that can physically exist
anywhere (cloud or on-premise).
Data Engineers necessary to understand how communication between
machines occurs on a data platform, how it enters a network and leaves
a network.
When building etl pipeline, Data Engineers will often communicate with
infra /network team or SRE (DevOps) so it is necessary to understand
the terms in the computer network.
Some reasons data engineers need to understand computer networks:
4. OSI Model
What Is the OSI Model
The Open Systems Interconnection (OSI) model
describes seven layers that computer systems use
to communicate over a network. It was the first
standard model for network communications,
adopted by all major computer and
telecommunication companies in the early 1980s
The modern Internet is not based on OSI, but on the
simpler TCP/IP model. However, the OSI 7-layer model is
still widely used, as it helps visualize and communicate
how networks operate, and helps isolate and
troubleshoot networking problems.
OSI was introduced in 1983 by representatives of
the major computer and telecom companies, and
was adopted by ISO as an international standard
in 1984.
5. Ping:
A utility, commonly known as PING, which is used to query another computer on a TCP-IP network in order to
check that there is a connection to it. The user of this utility executes it with the name of the computer or its IP
address, and it then sends out a set of messages which ask the remote computer to reply and produce a short
report on whether the connection was achieved. Most operating systems contain a simple PING utility; there are
also many commercial and shareware versions available. It is sometimes referred to as the Packet INternet
Gopher.
Network
Tools
6. telnet:
What is Telnet?
Telnet, developed in 1969, is a protocol that provides a command line interface for communication with a remote device or server, sometimes employed
for remote management but also for initial device setup like network hardware. Telnet stands for Teletype Network, but it can also be used as a verb; 'to
telnet' is to establish a connection using the Telnet protocol.
Is Telnet secure?
Because it was developed before the mainstream adaptation of the internet, Telnet on its own does not employ any form of encryption, making it outdated
in terms of modern security. It has largely been overlapped by Secure Shell (SSH) protocol (which has its own security considerations around remote
access), at least on the public internet, but for instances where Telnet is still in use, there are a few methods for securing your communications.
How does Telnet work?
Telnet provides users with a bidirectional interactive text-oriented communication system utilizing a virtual terminal connection over 8 byte. User data is
interspersed in-band with telnet control information over the transmission control protocol (TCP). Often, Telnet was used on a terminal to execute
functions remotely.
The user connects to the server by using the Telnet protocol, which means entering Telnet into a command prompt by following this syntax: telnet
hostname port. The user then executes commands on the server by using specific Telnet commands into the Telnet prompt. To end a session and log off,
the user ends a Telnet command with Telnet.
What are common uses for Telnet?
Telnet can be used to test or troubleshoot remote web or mail servers, as well as for remote access to MUDs (multi-user dungeon games) and trusted
internal networks.
7. SSH is often used to "login" and perform operations on remote computers but it may also be used for transferring data.
How do I use SSH?
You use a program on your computer (ssh client), to connect to our service (server) and transfer the data to/from our
storage using either a graphical user interface or command line. There are many programs available that enable you to
perform this transfer and some operating systems such as Mac OS X and Linux have this capability built in.
SSH clients will typically support SCP (Secure Copy) and/or SFTP (SSH File Transfer Protocol) for transferring data; we
tend to recommend using SFTP instead of SCP but both will work with our service.
Will I have to use the command line?
No, there are many very good programs with graphical interfaces such as WinSCP for Windows and Cyberduck for Mac OS
X. Please see the access guide for your operating system (Windows, Mac OS X and Linux) for more information.
Why did Research Data Services choose SSH?
SSH enables us to provide a service with encrypted access for the widest range of operating systems (Windows XP-10, Max
OS X and Linux); this would not be possible if we provided Windows networked drives (which utilise the SMB/CIFS
communication protocol). SSH is reliable and secure and is often used in the High Performance Computing community for
this reason.
ssh:
SSH or Secure Shell is a network communication
protocol that enables two computers to communicate
(c.f http or hypertext transfer protocol, which is the
protocol used to transfer hypertext such as web
pages) and share data. An inherent feature of ssh is
that the communication between the two computers
is encrypted meaning that it is suitable for use on
insecure networks.
9. Unix is an operating system. It supports multitasking and multi-user functionality. Unix is most widely used in all forms of computing
systems such as desktop, laptop, and servers. On Unix, there is a Graphical user interface similar to windows that support easy
navigation and support environment. With GUI, using a Unix based system is easy but still one should know the Unix commands for
the cases where a GUI is not available such as telnet session.
There are several different versions of UNIX, however, there are many similarities. The most popular varieties of UNIX systems are
Sun Solaris, Linux/GNU, and MacOS X.
Any UNIX operating system consists of three parts; kernel, shell and Programs.
Unix is indispensable. From simple command-line applications to connecting and talking to servers, Unix made possible which GUI
based other operating systems could not do. Unix is there in all sorts of applications and systems be it Android, iOS, PlayStation etc.
Those prospective candidates who are to work with server technology and administration, should definitely learn Unix, get familiar
with its commands, use cases, and core principle. Particularly, those who handle Linux or Ubuntu systems or even those who want
to go for big data analytics should surely learn uses of Unix. Simple applications of Unix commands such as pwd, chdir, dir, ls, ls-l,
passwd should be known to all computer science graduates or computer enthusiastic.
What is Unix:
10. Just like Windows, iOS, and Mac OS, Linux is an operating system. In fact,
one of the most popular platforms on the planet, Android, is powered by the
Linux operating system. An operating system is software that manages all of
the hardware resources associated with your desktop or laptop. To put it
simply, the operating system manages the communication between your
software and your hardware. Without the operating system (OS), the software
wouldn’t function.
What is Linux:
11. Bootloader – The software that manages the boot process of your computer. For most users, this will
simply be a splash screen that pops up and eventually goes away to boot into the operating system.
Kernel – This is the one piece of the whole that is actually called ?Linux?. The kernel is the core of the
system and manages the CPU, memory, and peripheral devices. The kernel is the lowest level of the OS.
Init system – This is a sub-system that bootstraps the user space and is charged with controlling
daemons. One of the most widely used init systems is systemd? which also happens to be one of the most
controversial. It is the init system that manages the boot process, once the initial booting is handed over
from the bootloader (i.e., GRUB or GRand Unified Bootloader).
Daemons – These are background services (printing, sound, scheduling, etc.) that either start up during
boot or after you log into the desktop.
Graphical server – This is the sub-system that displays the graphics on your monitor. It is commonly
referred to as the X server or just X.
Desktop environment – This is the piece that the users actually interact with. There are many desktop
environments to choose from (GNOME, Cinnamon, Mate, Pantheon, Enlightenment, KDE, Xfce, etc.). Each
desktop environment includes built-in applications (such as file managers, configuration tools, web
browsers, and games).
Applications – Desktop environments do not offer the full array of apps. Just like Windows and macOS,
Linux offers thousands upon thousands of high-quality software titles that can be easily found and
installed. Most modern Linux distributions (more on this below) include App Store-like tools that
centralize and simplify application installation. For example, Ubuntu Linux has the Ubuntu Software
Center (a rebrand of GNOME Software? Figure 1) which allows you to quickly search among the thousands
of apps and install them from one centralized location.
The Linux operating system comprises several different pieces:
1.
2.
3.
4.
5.
6.
7.
13. Docker Basic.
Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate
your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your
infrastructure in the same ways you manage your applications. By taking advantage of Docker’s methodologies for
shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and
running it in production.
14. The Docker platform
Docker provides the ability to package and run an application in a loosely isolated environment called a container.
The isolation and security allow you to run many containers simultaneously on a given host. Containers are
lightweight and contain everything needed to run the application, so you do not need to rely on what is currently
installed on the host. You can easily share containers while you work, and be sure that everyone you share with
gets the same container that works in the same way.
Docker provides tooling and a platform to manage the lifecycle of your containers:
·Develop your application and its supporting components using containers.
·The container becomes the unit for distributing and testing your application.
·When you’re ready, deploy your application into your production environment, as a container or an orchestrated
service. This works the same whether your production environment is a local data center, a cloud provider, or a
hybrid of the two.
15. Develop your application and its supporting components using containers.
The container becomes the unit for distributing and testing your application.
When you’re ready, deploy your application into your production environment, as a container or an orchestrated
service. This works the same whether your production environment is a local data center, a cloud provider, or a
hybrid of the two.
Docker provides tooling and a platform to manage the lifecycle of your containers:
16. What can I use Docker for?
Your developers write code locally and share their work with their colleagues using Docker
containers.
They use Docker to push their applications into a test environment and execute automated and
manual tests.
When developers find bugs, they can fix them in the development environment and redeploy
them to the test environment for testing and validation.
When testing is complete, getting the fix to the customer is as simple as pushing the updated
image to the production environment.
Fast, consistent delivery of your applications
Docker streamlines the development lifecycle by allowing developers to work in standardized
environments using local containers which provide your applications and services. Containers are
great for continuous integration and continuous delivery (CI/CD) workflows.
Consider the following example scenario:
17. Responsive deployment and scaling
Docker’s container-based platform allows for highly portable workloads. Docker containers can run
on a developer’s local laptop, on physical or virtual machines in a data center, on cloud providers,
or in a mixture of environments.
Docker’s portability and lightweight nature also make it easy to dynamically manage workloads,
scaling up or tearing down applications and services as business needs dictate, in near real time
Running more workloads on the same hardware
Docker is lightweight and fast. It provides a viable, cost-effective alternative to hypervisor-based
virtual machines, so you can use more of your compute capacity to achieve your business goals.
Docker is perfect for high density environments and for small and medium deployments where you
need to do more with fewer resources.
18. Docker
architecture
Docker uses a client-server architecture. The Docker client talks to the Docker daemon, which does the heavy
lifting of building, running, and distributing your Docker containers. The Docker client and daemon can run on
the same system, or you can connect a Docker client to a remote Docker daemon. The Docker client and
daemon communicate using a REST API, over UNIX sockets or a network interface. Another Docker client is
Docker Compose, that lets you work with applications consisting of a set of containers.
19. The Docker daemon
The Docker daemon (dockerd) listens for Docker API requests and manages Docker objects such as images,
containers, networks, and volumes. A daemon can also communicate with other daemons to manage Docker
services.
The Docker client
The Docker client (docker) is the primary way that many Docker users interact with Docker. When you use
commands such as docker run, the client sends these commands to dockerd, which carries them out. The
docker command uses the Docker API. The Docker client can communicate with more than one daemon.
The Docker Desktop
Docker Desktop is an easy-to-install application for your Mac or Windows environment that enables you to build
and share containerized applications and microservices. Docker Desktop includes the Docker daemon
(dockerd), the Docker client (docker), Docker Compose, Docker Content Trust, Kubernetes, and Credential
Helper. For more information, see Docker Desktop.
Docker registries
A Docker registry stores Docker images. Docker Hub is a public registry that anyone can use, and Docker is
configured to look for images on Docker Hub by default. You can even run your own private registry.
When you use the docker pull or docker run commands, the required images are pulled from your configured
registry. When you use the docker push command, your image is pushed to your configured registry.
20. Docker objects
When you use Docker, you are creating and using images, containers, networks, volumes, plugins, and other
objects. This section is a brief overview of some of those objects.
Images
An image is a read-only template with instructions for creating a Docker container. Often, an image is based on
another image, with some additional customization. For example, you may build an image which is based on the
ubuntu image, but installs the Apache web server and your application, as well as the configuration details
needed to make your application run.
You might create your own images or you might only use those created by others and published in a registry. To
build your own image, you create a Dockerfile with a simple syntax for defining the steps needed to create the
image and run it. Each instruction in a Dockerfile creates a layer in the image. When you change the Dockerfile
and rebuild the image, only those layers which have changed are rebuilt. This is part of what makes images so
lightweight, small, and fast, when compared to other virtualization technologies.
21. Containers
A container is a runnable instance of an image. You can create, start, stop, move, or delete a container using the
Docker API or CLI. You can connect a container to one or more networks, attach storage to it, or even create a
new image based on its current state.
By default, a container is relatively well isolated from other containers and its host machine. You can control how
isolated a container’s network, storage, or other underlying subsystems are from other containers or from the
host machine.
A container is defined by its image as well as any configuration options you provide to it when you create or start
it. When a container is removed, any changes to its state that are not stored in persistent storage disappear.