SlideShare a Scribd company logo
1 of 29
Download to read offline
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 1
Copyright Ikadega, Inc. All rights reserved.
Introduction to DirectPath subsystems
(docstechpubsinternal_docssubsystem_intro.doc)
This document contains overview information on the DirectPath™ subsystems. The
information comes from the Ikadega online documentation; the online component, and
not this document, will be the version that will be kept current. This document will be
updated from time to time from the online documents.
Information is currently available for a set of subsystems. More will follow over time.
Note: DirectPath is an evolving and changing system. This document describes the future
vision for each subsystem – how it is expected to look at some future point (such as
when the product first ships). Many parts of the design as described in this document
have not yet been implemented.
Note: Underlined terms are defined in the Ikadega glossary.
Document contents
Internet delivery subsystem.................................................................................................2
Component life cycles.....................................................................................................4
The TV delivery and MPEG platform subsystems..............................................................6
How hospitality systems work.........................................................................................6
How ad insertion works...................................................................................................8
The jukebox model..........................................................................................................9
The interactive model (hospitality only) .......................................................................10
The volume and file access subsystems ............................................................................11
File system service layers..............................................................................................12
Typical uses of the file system ......................................................................................12
The UNIX file system ...................................................................................................13
The file access subsystem..............................................................................................13
Access to smaller, named files ..................................................................................14
Block search services for the volume access subsystem ...........................................14
The volume access subsystem.......................................................................................16
Aggregation...............................................................................................................17
Striping ......................................................................................................................18
The hardware layer........................................................................................................18
Checkpoints...................................................................................................................19
A simple example......................................................................................................19
Checkpointing states and transitions .........................................................................23
Checkpointing states ............................................................................................23
State transitions ....................................................................................................24
Replication.....................................................................................................................25
Replication and checkpointing ..................................................................................26
Content transfer engine subsystem....................................................................................27
Inside the CTE...............................................................................................................28
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 2
Copyright Ikadega, Inc. All rights reserved.
This document describes the subsystems in the groupings shown in the Introduction to
DirectPath:
Internet
delivery
Manager
File access
IP messagingVolume access
Traffic & array control
ITM
Platforms:
Content transfer engine (CTE)
Controller event engine (CEE)
Open source environment (OSE)
MPEG platform (MPP)
TV (MPEG)
delivery
Core services
Application subsystems
Content transfer engine extension (CTEX)
See the introductory document for high-level descriptions of the subsystems. This
document contains more detailed descriptions.
Internet
delivery
subsystem
The Internet delivery subsystem drives the process of sending content to Internet users.
This picture shows its important components:
. . .
HTTP CTDs
HTTP XCTDs
FTP CTDs
FTP XCTDs
RTP CTDs
RTP XCTDs
Internet delivery subsystem
The subsystem contains a large number of content transfer daemons (CTDs) – generally a
separate one for each end-user session. (End users can have multiple sessions at the same
time.) The CTDs all run the same code, but each has its own event queue and a small
amount of private memory in external RAM.
A CTD’s primary function is data transfer. It has a limited set of commands and
functions, but this allows it to perform them very efficiently. Most of the traffic it handles
is bound for clients outside the DirectPath system. Its main task is to receive data from
the fabric, then place this data into outgoing message frames for the Internet. The daemon
is optimized for data transfer and does only minimal processing of the data.
Most CTDs have corresponding extended content transfer daemons (XCTDs). XCTDs
handle the non-routine processing that the CTDs do not do – the more complex error and
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 3
Copyright Ikadega, Inc. All rights reserved.
exception processing. CTDs and XCTDs communicate with each other via either ITM or
IP messaging.
The CTDs run in an FPGA on an IP access node. XCTDs run in a supplemental
processor, which is either on an IP access node or a supplemental processor node. The
content transfer engine (CTE) is the logical platform for the CTDs. For the XCTDs, the
platform is the content transfer engine extension (CTEX).
FPGA
Supplemental
processor
CTD XCTD
CTE CTEX
In some applications there is one XCTD per CTD, but in other applications an XCTD
might oversee several CTDs. For example, in a streaming video application, one XCTD
might work with the following CTDs, each of which processes a different type of
information:
XCTDRTSP CTD
HTML CTD
RTP CTD
For selecting titles
to download
For negotiating xfer
parameters -- speed,
format, etc.
For downloading
selected titles
There are several categories of CTDs and XCTDs – one category for each Internet
service supported by the system (HTTP, FTP, RTP, etc.). Event engines in the content
transfer engine (CTE) alert CTDs when there are events for them to process. This is the
flow of control for a single CTD/XCTD pair:
Event engine
CTE Internet delivery subsystem
CTD
Dispatch signal
XCTD
Exceptions and
complex tasks
Commands and
replies
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 4
Copyright Ikadega, Inc. All rights reserved.
This is the overall environment in which a single CTD and XCTD pair operates to handle
one user session:
CTD XCTD
Web server
daemons
OSE OS
File access
session
Volume access
session
Traffic/array
control
Client (e.g., end
user browser)
Component of the Internet delivery subsystem
Ikadega-developed DirectPath object not in the Internet delivery subsystem
Third-party component not developed by Ikadega
External resource
The Web server daemons run in the DirectPath system’s open source environment (OSE).
They must have a very specific server configuration to run effectively with the rest of the
system.
The Web delivery client is an Internet application such as a Web browser or FTP
program. In certain cases, there may also be one or more external resources for the
subsystem to deal with. One example of this is a credit card validation/approval system in
an e-commerce application. Depending on how complex the processing is, the interface
to an external resource could be handled by either the CTD (if only simple processing is
needed, such as reading cookies) or the XCTD (for more complicated processing).
In the initial versions of the system, the XCTD communicates directly with the Web
server daemons. In future versions, this communication may instead go through the
OSE’s operating system.
Component life
cycles
The system creates a fixed-size pool of CTDs at boot time, which it allocates one by one
for each new end-user session. The number of possible CTDs generally remains fixed. If
the system exhausts the CTD supply, it cannot create new end-user sessions until a CTD
is de-allocated. This protects the system against denial-of-service (DOS) attacks. By
limiting the number of possible sessions, the system can continue running if it receives
numerous session requests, though it may be temporarily unable to allow new sessions.
(System operators can set the size of the CTD pool in the Web-based configuration
utility.)
Since XCTDs run on a supplemental processor, which has an operating system and a
richer execution environment, there is not a fixed set of XCTDs. The system creates new
ones as needed.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 5
Copyright Ikadega, Inc. All rights reserved.
This is the life cycle of a CTD/XCTD pair supporting a typical end user HTTP session
(assuming a 1:1 relationship between CTDs and XCTDs):
1. When the system receives a request for a new user session, it allocates a CTD and
XCTD (creating a new one as necessary). The system does handshaking with the
client to determine the session type (HTTP, FTP, RTP, etc.). It configures the
CTD/XCTD pair and initializes a context accordingly.
2. As described above, either the XCTD or CTD might take part in authorizing and
validating the end user session.
3. Through an event engine in the content transfer engine (CTE), the CTD receives a
client request to transfer a file. To the CTD, this is simply a command it is not
programmed to process, so it passes it to the XCTD.
4. The XCTD receives the file transfer request and attempts to validate the transfer –
checking to see if the requested file exists and if the end user is authorized to receive
it, etc.. If it successfully validates the request, the XCTD generates a handle for the
requested file and passes it to the CTD. Then it tells the CTD to transfer the file.
5. The CTD begins the process of requesting data and preparing it to go out to the
Internet, using various services from other subsystems. At this stage, the XCTD only
becomes involved if there is an error or an exception, or if processing is needed that
the CTD does not know how to do.
During the file transfer, the XCTD knows what file is being downloaded but does not
know any details about the download, such as how much data has been sent so far.
The CTD knows these details but does not know what file it is processing.
6. The CTD notifies the XCTD when the file transfer is done.
7. The previous steps repeat for each subsequent file download requested by the client.
8. When the client sends a request to end the session, the CTD passes it to the XCTD
(again since the CTD is not programmed to process the request). The XCTD de-
allocates itself and the CTD.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 6
Copyright Ikadega, Inc. All rights reserved.
The TV
delivery and
MPEG
platform
subsystems
The DirectPath TV delivery and MPEG platform subsystems are closely tied together.
This document describes them both. These two subsystems deliver digital video to
support two DirectPath applications:
• Hospitality – A hospitality system provides in-room, on-demand video content to
local end users. (Future hospitality systems may also support in-room Web
browsing, as described later in this document.)
• Ad insertion – In this application, a customer such as a cable TV provider uses a
DirectPath system to insert their own advertisements or other content into a video
signal sent to cable subscribers.
Media server is Ikadega’s name for a DirectPath system used in either of these
applications.
How hospitality
systems work
In a typical hospitality application, one or more DirectPath media servers deliver digital
movies to hotel guests. The content can also include things like short advertising videos
for other nearby businesses.
The following picture shows the devices involved in hospitality delivery. The only
Ikadega-supplied component is the media server. The customer supplies and manages the
rest.
Media server
(DirectPath system)
Facility cable
plant
TV set
End user's room
End user
agent
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 7
Copyright Ikadega, Inc. All rights reserved.
To watch a movie, the user interacts with the customer’s end user agent system rather
than with DirectPath. This is the normal sequence of events when an end user wants to
use hospitality services:
1. The end user, on an in-room TV, makes a request to watch a movie or other program
(via an input device such as a remote control).
2. The customer’s end user agent receives this request and queries the media server for
information on the available content.
3. The media server sends the end user agent data on all the selections available,
including the title, running time, description, rating, etc. for each content file.
4. The end user agent takes this information to display menus and help the end user
make a selection. The agent also makes any necessary billing arrangements.
5. When the guest makes a selection, the end user agent directs the media server to
begin playback of the requested program to a specific media server port. The end user
agent tunes the user’s TV to the correct channel to receive the program. This channel
change is invisible to the end user – it does not change the channel number displayed
on the TV.
6. The media server plays the selection as requested. It sends the signal directly to the
end user’s TV through the building’s cable plant. The user may pause or halt
playback at any time. The only involvement the end user agent has during this phase
is to pass any pause/restart/stop commands to the media server.
7. The media server notifies the end user agent when playback is done.
Communication from the agent goes through an end user agent proxy. This is an
application that runs in the open source environment. It handles communication between
the end user agent and the DirectPath controller (DPC). The DPC takes action as
appropriate, which often affects the TV access node.
External
system
Proxy task DPC
TV access
node
Media server
OSE
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 8
Copyright Ikadega, Inc. All rights reserved.
How ad
insertion works
Ad insertion allows a local cable company to substitute its own commercials (usually for
local businesses) for those in the input broadcast (which are often made for a national
audience). This picture shows the major components involved:
Media server
(DirectPath system)
Ad
scheduler
Content
loader
Cable TV
head end
A/B
switch
"Go"
command
Signal
New
ads
Ad source
Subscribers
Numerous channels of content arrive at the cable TV head end. If there is no ad insertion
happening for a particular channel, that channel’s signal passes unchanged through the
A/B switch and on to the subscribers watching that channel. However, when the head end
receives notification that a commercial is about to start, it signals the ad scheduler
system.
The ad scheduler has a list of the commercials stored on the media server. It decides
whether to replace the national ad with one of these commercials, and then it chooses the
commercial to run. The scheduler sends a command to the media server to play that ad on
the specified channel. The ad scheduler also uses a proxy task to communicate with the
media server.
The media server immediately begins to play the commercial. The A/B switch replaces
the signal coming from the head end with the media server’s output signal. The
subscribers watching that channel see the ad being played by the media server.
From time to time, the content loader receives new digital ads. It passes them on to the
media server for storage, and it also notifies the ad scheduler, so the scheduler has a
current list of which commercials are stored in the media server. The ad scheduler and
content loader can run on the same machine or different machines.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 9
Copyright Ikadega, Inc. All rights reserved.
The jukebox
model
The early versions of the media server are designed around a jukebox model, in which it
plays the selection it’s told to by an outside system (the end user agent or ad scheduler).
A later section of this document describes the interactive model, to be implemented
sometime in the future. Whether it’s playing movies or inserting ads, the DirectPath
system has hardware and software in its TV access nodes to support video playback:
Sub-node 0
MPEG drivers
OS-9
DAVID
Application
MPEG decoder (and related
components)
Sub-node 1
Microprocessor
Node-fabric
interface
TV signal
Control
data
MPEG
image
stream
Ikadega component Third-party component
TV access node
There are eight sub-nodes on a TV access node, each of which produces one video signal.
Notice that one node-fabric interface (NFIF) handles fabric communication for all of
them. Most of the data arriving at the NFIF is the video data, which it passes directly to
the appropriate MPEG decoder rather than to the microprocessor. (This is similar in
philosophy to how DirectPath storage nodes pass data directly to access nodes without
going through the DirectPath controller.) The data going from the microprocessor to the
node-fabric interface includes requests for more content data from the storage nodes.
The Ikadega application running in the microprocessor would work on tasks such as
closed captioning, providing visuals to accompany audio-only content, and
superimposing text or graphics over the video for weather warnings, logos and other
images. The MPEG decoder’s “related components” from the previous drawing include
logic to support internal MPEG transport, superimposing, and audio-video mixing.
Subsystem information: the Ikadega microprocessor application and end user agent proxy
are part of the TV delivery subsystem, while the other sub-node components are in the
MPEG platform subsystem.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 10
Copyright Ikadega, Inc. All rights reserved.
The interactive
model
(hospitality
only)
In future versions of hospitality systems, the user will interact with a Web browser
running in the media server. This provides a more appealing and functional selection
system that that provided by the original end user agent, which tends to be character-
oriented. The design might look like this:
Sub-node 0
Drivers
OS-9
DAVID
Browser
MPEG decoder (and related
components)
Sub-node 1
Microprocessor
Node-fabric
interface
TV signal
Control
data
MPEG
image
stream
Ikadega component Third-party component
Applet
Created by VAR
These are the major differences in the interactive model:
• End users will be able to go on the Internet from their rooms.
• End users who would rather watch a movie than go on the Internet will select
content via an applet running in a browser in the microprocessor. The customer
or VAR will probably create this applet.
• Since the end user will use the browser to make content selections, the end user
agent has a reduced role – it simply passes keystrokes between the end user and
the browser.
• There is no interactive model for ad insertion.
Subsystem information: the applet and end user agent proxy are in the TV delivery
subsystem, while the other sub-node components are in the MPEG platform subsystem.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 11
Copyright Ikadega, Inc. All rights reserved.
The volume
and file
access
subsystems
The volume access subsystem and file access subsystem are the DirectPath file system.
These subsystems support the reading and writing of data on storage node hard drives.
The file system can accommodate a wide range of uses. In some applications, such as
hospitality systems that primarily play back movies to locally connected TV sets, the file
system holds a relatively small number (in the range of hundreds) of very large files.
These files do not change very frequently, and owners load new files relatively
infrequently (say on a daily or weekly basis). Other customers, however, will use
DirectPath to host and deliver Web sites. These customers need a file system that can
handle large numbers (in the tens of thousands) of small files that change relatively
frequently. Between these two extremes are customers like an online music service, who
must deliver one set of small files (say the Web pages where users select songs to
download) and another set of fairly large ones (the actual MP3 song files). The
DirectPath file system has flexibility to accommodate these varying uses in one design.
The file system consists of two DirectPath subsystems:
• Volume access subsystem – In DirectPath, a volume is a logically continuous set
of disk sectors. The volume access subsystem is unaware that some volumes
contain multiple files.
• File access subsystem – A DirectPath file is a named portion of a volume. File
accesses go through the volume access subsystem.
Notice from this drawing that all disk accesses go through the volume access subsystem,
either directly or through the file access subsystem:
Client
task
Volume access
subsystem
File access
subsystem
Accessing a file
Accessing a volume
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 12
Copyright Ikadega, Inc. All rights reserved.
File system
service layers
You can think of the DirectPath file system as a collection of services divided into the
following layers:
Implemented here: aggregation,
replication, striping, checkpoints.
Used here: block search services
from the file layer.
Applications
Request and work
with the data
File subsys.
Identifies & manages
named data files
Volume subsystem
Locates & places the
data on disk
Hardware layer
Reads and writes the
data
UNIX file
system
Implemented here: block search
services for the volume layer.
Used here: checkpoints.
The remaining introductory pages describe these components, from the higher-level
directory layer and UNIX file system to the low-level hardware layer.
Typical uses of
the file system
DirectPath can actually support multiple file systems running concurrently. The Ikadega-
supplied file system can run together with the UNIX file system. It can also exist in the
same machine as an optional customer-defined file system.
Below are some examples of how customers could used the Ikadega-supplied file
systems:
• Local large content delivery, where the system delivers very large files to nearby
users – for example, movies to hotel guests. In this scheme, there usually is only
one company providing the content. Since the data for a content file is not likely
to change, the content rarely if ever goes through different versions. What does
change over time is the set of movies available – new ones are added and older
ones might be removed. The volume access subsystem provides the services for
this type of use.
In this type of system, the file access subsystem exists but is essentially empty –
it just passes I/O requests to the volume access subsystem with little or no
processing.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 13
Copyright Ikadega, Inc. All rights reserved.
• Internet delivery, where the system hosts numerous Web sites containing various
file types, from small files to movies. The content on these sites comes from a
number of content providers, and from time to time the system owner may need
to find out who created a certain file. While some of these files may be as large
as the movies described above, there are probably also a number of small files.
The system must be able to locate and process all of these files. It must also be
able to deal with them being replaced frequently. In applications like this, the
system relies on the services of the volume and file access subsystems.
The file access subsystem processes these files.
• UNIX file access, described in the next section.
Most DirectPath systems have a mixture of these file types.
The UNIX file
system
A complete UNIX file system may exist in DirectPath to support legacy applications and
other programs that need UNIX services. One example of this is the system event
logging, which could be implemented by using UNIX logging services. Also, the
Manager subsystem, designed to be as UNIX-like as possible, uses UNIX services.
The UNIX file system can perform reads and writes on its own private disk (a disk
invisible to the other file systems), or it can use the volume access subsystem for disk
access, or it can do both. The private disk is currently used in system booting. When it
uses the volume access subsystem for disk access, the UNIX file system has its own
volume, which it thinks is an entire disk. It isn’t aware that the volume access subsystem
is even there.
Possible future directions: In media server applications that mainly deliver digital video,
it may be possible to use the UNIX file system as the only file service, without using the
volume or file access subsystem. (The file access subsystem doesn’t do much in these
applications anyway). It’s also possible, though, that the file access subsystem might take
over all the functions of the UNIX file system in future versions of the system.
The file access
subsystem
The file access subsystem is the highest layer in the file system hierarchy. The nature of
its processing depends on the type of data being processed. The subsystem is mostly
transparent when processing large content files such as digital movies or music – the
volume access subsystem does most of the work on these files. The file access subsystem
becomes important when the system works with numerous, small content files. For
example, if a customer uses a DirectPath system for hosting Web sites, each site will
have a number of relatively small files, and the file access subsystem would process the
individual files in the site (HTML, GIF, etc.).
One key feature of the DirectPath file system is that virtually all of the content transfer
from disk happens in the volume access subsystem rather than the file access subsystem.
This gives the system a speed advantage over traditional file servers.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 14
Copyright Ikadega, Inc. All rights reserved.
Access to smaller,
named files
The volume access subsystem sees large blocks of data with no internal structure – for
example, digital movie files that are delivered to users from beginning to end. The file
access subsystem gives the system access to many smaller named files, such as the files
that make up a Web site.
f1 f2 f3 f4 f5 . . . fn
Where the volume access subsystem
sees one large volume...
...the file access subsystem might see
a number of smaller named files.
Block search
services for the
volume access
subsystem
To find files, the file system has several different directories:
• Inode directory – an inode is a system data structure that describes a file. An
application references a file by giving the file system an inode number.
• URL directory – this is a table that maps URLs to inodes, in effect providing
URL “names” for the inodes.
• Traditional file system directory – another inode mapping table, but one that
mimics the hierarchical tree structure of subdirectories and files commonly used
in PCs and UNIX machines. These directories also point to (and “name”) inodes.
There are three basic methods for reading content, depending on the nature of the files
involved:
• Locate method – the client task wants to read from a certain offset into a volume,
which it has a handle to. This method is for large files. Here is a typical sequence
of events:
File system
session
1
2
3 4
5
6
7
8
9
Volume system
session
Storage node
Client
10
11
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 15
Copyright Ikadega, Inc. All rights reserved.
1: Open request (client sends either a file name or URL). 2: Open reply (returns a handle to the
file, if found). 3: Locate request. 4: Locate reply (returns a map of the file’s block segments). 5:
Volume read request. 6: Sector read request. 7: Data transfer to client buffer (an RDMA transfer).
8 & 9: Request replies. 10: Close request. 11: Close reply.
Steps 5 through 9 repeat until the client has received the entire file.
• Whole file method – for quick access to files small enough to be fully retrieved in
one read operation (such as Web site files). One benefit of this method is that
there are no file open or close operations.
File system
session
1
7
2
5
4 3
6
Volume system
session
Storage node
Client
1: Read file request (client sends either a file name or URL). 2: File system session passes read
request along. 3: Sector read request. 4: Data transfer to client buffer (an RDMA transfer). 5, 6, 7:
Request replies.
• Traditional method (with Ikadega enhancements) – this method supports file
reads as done on a UNIX system. The method also supports traditional file
operations such as renaming, setting permissions, etc., and it supports DAFS.
File system
session
1
2
4
56
3
8
Volume system
session
Storage node
Client
9
10
11
7
1: File open request. 2:Open request reply. 3: File read request. 4: Read request passed along. 5:
Sector read request. 6: Data transfer to client buffer (an RDMA transfer). 7, 8, 9: Request replies.
10: File close request. 11: File close reply.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 16
Copyright Ikadega, Inc. All rights reserved.
The “Ikadega enhancements” mentioned above include the direct RDMA content
transfer from the storage node to the client. Traditional file systems would send
the content to the client through the volume and file system sessions.
For added flexibility, clients may shift between the locate and traditional methods with
the same file handle.
Note: These drawings assume that the directories are fully cached in memory and that
files are stored contiguously on disk.
The volume
access
subsystem
The volume access subsystem supports the file access subsystem and UNIX file system.
Volumes are logical collections of sectors, often organized into categories of content
stored on the system, such as the top N most popular titles and the other less-popular
files. Most volumes contain large content files sized from the hundreds of megabytes to
gigabytes and beyond. Volumes generally have fewer attributes than files – they do not
have items such as access permission data, modification and access dates, checkpoint
information, etc.
The volume access subsystem is where you first start to see disk organization. A disk has
one or more disk slices, each of which contains partitions. Partitions cannot cross disk
slice boundaries. There also is a partition descriptor for each partition in a slice.
Disks
Partitions
Disk slice
boundaries
Partition descriptors
Disk slices help support the work of offline utility applications. One example of such an
application is a program that pre-loads content before disks are shipped. The system
formats disk slices like conventional operating system partitions.
Note to readers who are familiar with the system’s traffic shaping components: You can
think of the volume access subsystem as a part of the components responsible for storage
array control and fabric traffic.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 17
Copyright Ikadega, Inc. All rights reserved.
Aggregation
One method for splitting volumes into partitions is aggregation. This is simply breaking
the content into partitions, which can reside on different disks or storage nodes.
Original volume Disk 2 Disk 8
100 gigabytes 40 GB 60 GB
With checkpointing or replication, the aggregation can have different boundaries for each
version:
40 GB 60 GB 40 GB30 GB 30 GB
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 18
Copyright Ikadega, Inc. All rights reserved.
Striping
Striping is another way the system splits content files into partitions. Striping is a disk
storage technique that helps to protect against lost content. It splits up a content file into
equal-length blocks called stripes. The system stores these stripes on N different storage
nodes (here N = 4), along with an additional stripe described below:
0
1
2
3
4
(^ = exclusive OR)
5
6
7
0 1
2 3
4 5
6 7
Storage node 1 Storage node 2
Storage node 3 Storage node 4
Original volume contents
0^1^2^3
4^5^6^7
Storage node 5
Parity stripe
Each byte in the parity stripe (at the bottom of the drawing) is the result of an exclusive
OR logic operation on the bytes in the corresponding stripes. For example, the first byte
of the parity stripe is the result of an exclusive OR performed on the first bytes of stripes
0 through 3. If the system can’t read one of the stripes (say if there is a disk or storage
node error), it can re-create the lost data by comparing the values in the parity stripe with
those in the remaining stripes.
The hardware
layer
This layer contains the hard disks and their controlling hardware and software, all located
on storage nodes. The layer doesn’t know the meaning of the data it reads and writes. It
just responds to specific commands. Most of the work it does is read operations, but it
does write to disk as well, to load new content or make copies of volumes.
Every storage node has multiple sub-nodes (two of them at present), each of which
controls one ATA-type hard disk drive. The sub-nodes have custom disk-controlling
hardware as well as interfaces to the fabric. Since the whole DirectPath system is
designed to keep the disk drives as busy as possible, the system makes heavy demands on
the drives, and there is very little room for malfunctions or disk errors. Field service
people can replace disks “on the fly” (while the rest of the system continues to run and
deliver content) to remove a faulty disk, install a drive with more capacity, or insert a
disk pre-filled with new content.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 19
Copyright Ikadega, Inc. All rights reserved.
To assist the disk activity scheduler, DirectPath maintains a set of performance history
data on each disk. This data reflects the actual performance of each individual drive
(rather than the specifications for the drive type).
Checkpoints
Checkpoints allow the system to keep multiple file versions on disk, mostly to ensure
read consistency – each end user getting files from the same file set. This is useful in
many applications, especially with frequently updated files such as Web files. If a site is
popular, there might be a number of people using it when it’s time for one of the file
updates Web sites often have. If a site changes frequently, at some point there will be
users with files open from several revisions ago, especially for users with slow Internet
connections. With checkpointing, the new and older versions of the site co-exist while the
system loads new files to the disk drives.
The checkpoint feature is implemented in the volume access subsystem, though it would
only be used on the files processed by the file access subsystem.
The files for a new checkpoint become available to users when the file system commits
them. A commit operation updates the partition descriptors for the volume. New users
aren’t able to use the new checkpoint until all of the new files are committed
successfully. Users that had site files open at the start of the update see only files from the
most current checkpoint when they started their sessions. If the system halts during the
loading stage (before it can commit a new checkpoint), the checkpoint and its new
content are lost. The system retains the previous committed checkpoints, though.
The DirectPath customer can specify how many checkpoints to keep for each volume.
The system generally re-uses the storage space of expired versions. This can be a rapid
process – on some systems that change content very quickly, the resources for a replaced
checkpoint may be re-used in as little as 4 minutes.
The checkpoint feature is implemented in the volume access subsystem, but DirectPath
only uses it on the smaller named files processed by the file access subsystem. At any
given moment, a checkpointed volume has files from 0 to n checkpoints available, and it
may also have a new checkpoint in progress.
A simple example
To understand checkpointing, take an example volume that only has five files. (The
example is small and unrealistic, but it demonstrates the basics of how checkpointing
works.) Suppose there is a DirectPath system that hosts Web sites, and it receives five
files for the initial version of a site. The files are a.html, b.html, c.html, d.gif, and e.gif.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 20
Copyright Ikadega, Inc. All rights reserved.
When the new files arrive, if they are to go in a new volume, the application software
allocates a certain amount of space for the volume. At this point there is officially nothing
in the volume – none of the blocks is committed, though the file system may have started
loading the files to disk.
.
.
.
Data being stored
to disk
At this stage, the files for the example Web site are present on their way to disk, but they
are not yet available to users.
At this stage, the files for the Web site are present on their way to disk, but they are not
yet available to users.
When the files are stored successfully, the file system can commit the checkpoint. After it
does this, new users open the files from this first checkpoint:
a.html (1..latest)
b.html (1..latest)
c.html (1..latest)
d.gif (1..latest)
e.gif (1..latest)
..
.
Most recently committed: 1
Oldest retained checkpoint: 1
The (m..n) notation indicates the checkpoints each file belongs to. (The system does not
store this information with the files, however – it maintains it in memory only.) Oldest
retained checkpoint and Most recently committed are two variables the system uses to
keep track of a volume’s checkpoints.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 21
Copyright Ikadega, Inc. All rights reserved.
Now, suppose sometime later there is a change to the a.html file, where a completely
new version of the file replaces the first version. The system stores the new version in the
first available free space, and then commits it:
a.html (1)
b.html (1..latest)
c.html (1..latest)
d.gif (1..latest)
e.gif (1..latest)
a.html (2..latest)
..
.
Invalidated
New
Most recently committed: 2
Oldest retained: 1
At this point, the volume contains files from checkpoints 1 and 2. The commit invalidates
the first version of a.html, which means that the file is still there but is no longer in the
latest checkpoint. However, the file is still valid for sessions using checkpoint 1, if any. If
certain conditions are met later (see below), the file system could eventually re-allocate
the storage used by this first version of a.html. The users are not aware of the
checkpointing or the different file versions.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 22
Copyright Ikadega, Inc. All rights reserved.
Suppose now that there two file changes for the next checkpoint. The site owner changes
the b.html file, which had been the only file referencing d.gif. The new b.html no longer
uses the graphic file. In addition to invalidating the old b.html, the system invalidates
d.gif – the file does not apply to the new checkpoint or the ones that follow it (unless one
of the HTML files is changed to refer to it again). d.gif and the original b.html are still
valid for users of the checkpoints 1 and 2. Here’s what the volume looks like after the
commit:
a.html (1)
b.html (1..2)
c.html (1..latest)
d.gif (1..2)
e.gif (1..latest)
a.html (2..latest)
b.html (3..latest)
..
.
Most recently committed: 3
Oldest retained: 1
What eventually happens to the previous version of b.html, d.gif, and other invalidated
files depends on the customer’s file allocation policies. The system’s operators probably
want to keep files from at least some of the previous checkpoints, in which case these
files would remain there unchanged. However, to keep disk clutter down, most customers
also want to limit the number of checkpoints remaining on disk. So if a customer chooses
a checkpoint limit, it affects what the file system does when there is a new checkpoint. If
the following conditions are both true, then the file system could mark a file reclaimable
(discarded and available for re-use by new files):
• If there are currently no sessions using the checkpoint in question, and
• If the file’s checkpoint number is older than the new checkpoint’s number minus
the checkpoint limit (for example, with a limit of 5 and committing checkpoint 8,
if a file is from checkpoints 3, 2, or 1)
The file system marks a file as reclaimable only if both conditions are true for it. If the
file system marks a file as reclaimable, its blocks no longer contain valid data, though the
system might not re-use them right away.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 23
Copyright Ikadega, Inc. All rights reserved.
Checkpointing
states and
transitions
This document describes some of the data the DirectPath file system uses to process
checkpoints. It also shows how these variables change in response to various checkpoint
events.
Note: The information in this document applies only to customer environments with
relatively small numbers of content providers. This document does not apply to
environments where there are numerous content providers.
The file system maintains the following variables to support checkpointing, which are
maintained by the volume access subsystem and file access subsystem:
• For each volume:
o OldestRetained – this is the number of the earliest checkpoint the system
must still honor (retain the files for). Recall from About checkpoints that
some end users could still be using files from one or more previous
checkpoints.
o LatestCommitted – the number of the most recently committed checkpoint
for a volume.
o OldestBeingUsed – the oldest checkpoint that still has active user sessions.
• For each file:
o Modified – this is the number of the checkpoint containing the latest version
of a particular file.
o Invalidated – the checkpoint when the file was removed from the latest
checkpoint, though it still may be in use by active user sessions.
Note: The DirectPath file system currently tracks these two variables for each
file. It could in the future track them by block instead.
Checkpointing
states
The following table shows the different states a file can be in. Note, though, that the
DPFS does not store these file states anywhere. The state of each file is implied by the
values of the above variables. The reason for this is system performance and consistency
– if a checkpoint that affects, say, 200 files aborts before committing, the system would
be slowed down by first marking and then un-marking all 200 files. Also, if the system
halts in the middle of a 200-file update, the file system would not be able to correctly tell
which files are members of each checkpoint when it comes back up.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 24
Copyright Ikadega, Inc. All rights reserved.
There are abbreviations in the tables: F for file and V for volume. For example,
F.Modified means the Modified variable for a file, and V.Committed is the Committed
value for a volume.
Implied state Values causing that state Comment
Free F.Invalidated <= V.OldestRetained When the DirectPath file system wants
to create new files for a checkpoint, it
first allocates free space for them.
Newly
allocated
F.Modified > V.LatestCommitted;
F.Invalidated == NULL
This is the status of a new file being
created for a future (not yet committed)
checkpoint. If the file system commits
the checkpoint, the file’s state becomes
In-use retained. If the system instead
aborts the checkpoint, it makes the file’s
resources free again for re-use.
In-use
retained
F.Modified <= V.LatestCommitted;
F.Invalidated == NULL
This is a normal file state – it means that
the file is part of the volume’s most
recently committed checkpoint.
Invalidation
pending
F.Invalidated > V.LatestCommitted In this state, the file system is in the
process of creating a checkpoint that,
when committed, will invalidate the file.
If the commit operation completes, the
status changes to Retained. If the system
does not commit the new checkpoint, it
eventually returns the file to the In-use
retained state (sometime before it
commits the next checkpoint).
Retained F.Invalidated > V.OldestRetained;
F.Invalidated <= V.LatestCommitted
A file in this state has been invalidated.
It is no longer in the volume’s latest
checkpoint, but it is still part of an older
checkpoint that has current users. When
all of this checkpoint’s users end their
sessions, the system could delete the file
(put it in the Free state, in a sense) and
re-use its resources, depending on its
checkpoint retention policy.
State transitions
This drawing shows the checkpoint states a typical file goes through during its life span:
V.OldestRetained or
Is F.Invalidated < the smaller of: ?
Newly
allocated
In-use
retained
commit Newly
invalidated
commit Too old to
retain?
RetainedNo
Yes
Free
commit
V.OldestBeingUsed
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 25
Copyright Ikadega, Inc. All rights reserved.
The following table shows how the system updates variables and changes implied states
for individual files in response to miscellaneous checkpoint events.
This table describes how the system updates variables and changes states in response to
different checkpoint events.
From state To state Triggering event Variables modified
Free Newly allocated File allocation F.Modified := V.LatestCommitted + 1
Newly
allocated
Free File delete F.Invalidated := V.LatestCommitted +
1
Newly
allocated
Free Checkpoint abort F.Modified := NULL
Newly
allocated
In-use retained Commit Increment V.LatestCommitted
In-use retained Retained Commit, when
V.OldestRetained
is still <
F.Invalidated
Increment V.LatestCommitted
In-use retained Free Commit, when
V.OldestRetained
is now >=
F.Invalidated
Increment V.LatestCommitted
Retained Free Commit, when
V.OldestRetained
is now >=
F.Invalidated
Increment V.LatestCommitted
Invalidation
pending
In-use retained Checkpoint abort F.Invalidated := NULL
Replication
Replication is a disk storage method for protecting against lost content by making
complete copies of volumes on different storage nodes. This is useful when one copy of a
content file is not enough to meet the peak demand for the file. It also improves the
system’s peak throughput. System managers can use replication to place copies of
popular content near the physical outer edges of disk drives, where data is read faster
(since more bytes pass under the read/write head in the same amount of time than near
the middle). Replication strategy is how a customer wants to replicate volumes.
Replication of content is always a relatively low-priority activity, not important enough
to interfere with sending out content. As a result, there is lagged replication – the system
usually finishes the copy operation somewhat after finishing the storage of the original
content.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 26
Copyright Ikadega, Inc. All rights reserved.
Replication and
checkpointing
On systems using checkpointing, lagged replication follows behind the commitment of
each checkpoint. For example, when the system originally allocates space for a volume,
the replicated copy doesn’t exist yet:
Original volume Copy
As the original volume grows and changes, the copy might lag behind like this:
CPs included: 1
CPs: 1, 2 CPs: 1
CPs: 1, 2, 3 CPs: 1, 2
At this point, if it needed data from checkpoints 1 or 2, the file system could get it from
either the original or the copy (assuming that the file in question has been invalidated in
either of these checkpoints).
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 27
Copyright Ikadega, Inc. All rights reserved.
Content
transfer
engine
subsystem
A content transfer engine is a platform that supports a number of content transfer
daemons (CTDs). In the initial implementation, it works with CTDs in the Internet
delivery subsystem. Together the two subsystems send data to Internet users.
A supplemental processor initializes and oversees the CTE. Sometimes these two
components are on the same access node:
IP access node
Content transfer
engine
Supplemental
processor
FabricInternet
Here the supplemental processor is on a separate supplemental processor node:
Supplemental
processor
Supplemental
processor node
Content transfer
engine
IP access node
Fabric
Internet
The CTE is implemented as a set of fixed-function engines and re-programmable micro-
engines, all on a field-programmable gate array (FPGA). The engine’s major functions
are buffer management and two communication interfaces going to the fabric and the
Internet.
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 28
Copyright Ikadega, Inc. All rights reserved.
Inside the CTE
This drawing shows the CTE’s internal structure:
Node-fabric
interface
engine
EventQEventQ
Event engine
External
network
interface
engine
EventQ
Fabric
External
network
Buffer &
memory
control
External RAM
Content Transfer Engine
There are multiple event engines – more than one, but perhaps only a few. Each event
engine has its own queue. The event engines receive notices about events to be processed
by particular CTDs. The event engine dispatches an event by waking up the correct CTD
to process it:
EventQ n
Event engine n
EventQ n-1
Event engine
n-1
. . .
Content Transfer Engine subsystem
Internet delivery
subsystem
CTDCTD
CTD
CTD
Dispatch signal
The CTE assigns an arriving event to an event engine in this way:
• If the CTE currently has no other pending events for the particular CTD, the
event goes to a randomly chosen event engine.
• However, if the event queues already contain at least one event for that CTD,
then all events for that CTD must go through the same event engine, until they
are all dispatched.
One of the critical issues for the content transfer engine is memory management. The
CTE makes a minimum of memory transfers as it receives data from storage nodes and
sends it to the external network. It does this by initially storing data from the access
nodes into external RAM buffers, then sending it to the Internet from those same buffers.
A supplemental processor runs a component called the content transfer engine extension
(CTEX). CTEX runs extended content transfer daemons (XCTDs), which extend the
Introduction to DirectPath subsystems
PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 29
Copyright Ikadega, Inc. All rights reserved.
content processing functions beyond the CTE’s limited processing scope. For example,
XCTDs have more flexible access to shared data than CTDs do.

More Related Content

What's hot

CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 
Inter process communication using Linux System Calls
Inter process communication using Linux System CallsInter process communication using Linux System Calls
Inter process communication using Linux System Callsjyoti9vssut
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 
Basic features of distributed system
Basic features of distributed systemBasic features of distributed system
Basic features of distributed systemsatish raj
 
Operating system support in distributed system
Operating system support in distributed systemOperating system support in distributed system
Operating system support in distributed systemishapadhy
 

What's hot (14)

CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Os
OsOs
Os
 
Operating system
Operating systemOperating system
Operating system
 
1.intro. to distributed system
1.intro. to distributed system1.intro. to distributed system
1.intro. to distributed system
 
Distributed System
Distributed System Distributed System
Distributed System
 
CS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMSCS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMS
 
Inter process communication using Linux System Calls
Inter process communication using Linux System CallsInter process communication using Linux System Calls
Inter process communication using Linux System Calls
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Coda file system tahir
Coda file system   tahirCoda file system   tahir
Coda file system tahir
 
Distributed Operating System_4
Distributed Operating System_4Distributed Operating System_4
Distributed Operating System_4
 
Basic features of distributed system
Basic features of distributed systemBasic features of distributed system
Basic features of distributed system
 
Distributed Coordination-Based Systems
Distributed Coordination-Based SystemsDistributed Coordination-Based Systems
Distributed Coordination-Based Systems
 
Chapter16 new
Chapter16 newChapter16 new
Chapter16 new
 
Operating system support in distributed system
Operating system support in distributed systemOperating system support in distributed system
Operating system support in distributed system
 

Viewers also liked

Presentation for CIADC
Presentation for CIADCPresentation for CIADC
Presentation for CIADCLauren Serra
 
WatsonBruce NanoInk sample
WatsonBruce NanoInk sampleWatsonBruce NanoInk sample
WatsonBruce NanoInk sampleBruce Watson
 
Syntax Error Draft 4.12.2012 (1)
Syntax Error Draft 4.12.2012 (1)Syntax Error Draft 4.12.2012 (1)
Syntax Error Draft 4.12.2012 (1)Al Ho
 
WatsonBruce Ikadega sample 2
WatsonBruce Ikadega sample 2WatsonBruce Ikadega sample 2
WatsonBruce Ikadega sample 2Bruce Watson
 
Ishmael Mayhew_Professional Persona Projects
Ishmael Mayhew_Professional Persona ProjectsIshmael Mayhew_Professional Persona Projects
Ishmael Mayhew_Professional Persona ProjectsIshmael_Mayhew
 
Floaters - The story of an eye
Floaters - The story of an eyeFloaters - The story of an eye
Floaters - The story of an eyeJames Lehmer
 
237861559 final-project-colgate
237861559 final-project-colgate237861559 final-project-colgate
237861559 final-project-colgateHiren Valera
 
Materi Turunan
Materi TurunanMateri Turunan
Materi TurunanSridayani
 

Viewers also liked (14)

Presentation for CIADC
Presentation for CIADCPresentation for CIADC
Presentation for CIADC
 
Real Estate
Real EstateReal Estate
Real Estate
 
shahul updated resume
shahul updated resumeshahul updated resume
shahul updated resume
 
WatsonBruce NanoInk sample
WatsonBruce NanoInk sampleWatsonBruce NanoInk sample
WatsonBruce NanoInk sample
 
Project Report New
Project Report NewProject Report New
Project Report New
 
Syntax Error Draft 4.12.2012 (1)
Syntax Error Draft 4.12.2012 (1)Syntax Error Draft 4.12.2012 (1)
Syntax Error Draft 4.12.2012 (1)
 
WatsonBruce Ikadega sample 2
WatsonBruce Ikadega sample 2WatsonBruce Ikadega sample 2
WatsonBruce Ikadega sample 2
 
Ishmael Mayhew_Professional Persona Projects
Ishmael Mayhew_Professional Persona ProjectsIshmael Mayhew_Professional Persona Projects
Ishmael Mayhew_Professional Persona Projects
 
Floaters - The story of an eye
Floaters - The story of an eyeFloaters - The story of an eye
Floaters - The story of an eye
 
Brand Building
Brand BuildingBrand Building
Brand Building
 
CD Case Study
CD Case StudyCD Case Study
CD Case Study
 
237861559 final-project-colgate
237861559 final-project-colgate237861559 final-project-colgate
237861559 final-project-colgate
 
Materi Turunan
Materi TurunanMateri Turunan
Materi Turunan
 
Question 2
Question 2Question 2
Question 2
 

Similar to WatsonBruce Ikadega sample 1

Distributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applicationsDistributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applicationsJaime Martin Losa
 
Citrix command lines
Citrix command linesCitrix command lines
Citrix command linesprincesly
 
Intranet Messaging Project Report -phpapp02
Intranet Messaging Project Report -phpapp02Intranet Messaging Project Report -phpapp02
Intranet Messaging Project Report -phpapp02dvicky12
 
Unit 3 Assignment 1 Osi Model
Unit 3 Assignment 1 Osi ModelUnit 3 Assignment 1 Osi Model
Unit 3 Assignment 1 Osi ModelJacqueline Thomas
 
Infrastructure student
Infrastructure studentInfrastructure student
Infrastructure studentJohn Scrugham
 
Disadvantages Of Robotium
Disadvantages Of RobotiumDisadvantages Of Robotium
Disadvantages Of RobotiumSusan Tullis
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.pptrahul km
 
Net framework key components - By Senthil Chinnakonda
Net framework key components - By Senthil ChinnakondaNet framework key components - By Senthil Chinnakonda
Net framework key components - By Senthil Chinnakondatalenttransform
 
What's the Right Messaging Standard for the IoT?
What's the Right Messaging  Standard for the IoT?What's the Right Messaging  Standard for the IoT?
What's the Right Messaging Standard for the IoT?Angelo Corsaro
 
EMBEDDED WEB TECHNOLOGY
EMBEDDED WEB TECHNOLOGYEMBEDDED WEB TECHNOLOGY
EMBEDDED WEB TECHNOLOGYVinay Kumar
 
Driver Configuration Webinar
Driver Configuration WebinarDriver Configuration Webinar
Driver Configuration WebinarAVEVA
 
Implementation of a Deadline Monotonic algorithm for aperiodic traffic schedu...
Implementation of a Deadline Monotonic algorithm for aperiodic traffic schedu...Implementation of a Deadline Monotonic algorithm for aperiodic traffic schedu...
Implementation of a Deadline Monotonic algorithm for aperiodic traffic schedu...Andrea Tino
 
Analysis Of Internet Protocol ( IP ) Datagrams
Analysis Of Internet Protocol ( IP ) DatagramsAnalysis Of Internet Protocol ( IP ) Datagrams
Analysis Of Internet Protocol ( IP ) DatagramsEmily Jones
 
The .net remote systems
The .net remote systemsThe .net remote systems
The .net remote systemsRaghu nath
 
Linux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerLinux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerIOSR Journals
 
Linux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerLinux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerIOSR Journals
 

Similar to WatsonBruce Ikadega sample 1 (20)

Distributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applicationsDistributed Systems: How to connect your real-time applications
Distributed Systems: How to connect your real-time applications
 
Citrix command lines
Citrix command linesCitrix command lines
Citrix command lines
 
Intranet Messaging Project Report -phpapp02
Intranet Messaging Project Report -phpapp02Intranet Messaging Project Report -phpapp02
Intranet Messaging Project Report -phpapp02
 
Unit 3 Assignment 1 Osi Model
Unit 3 Assignment 1 Osi ModelUnit 3 Assignment 1 Osi Model
Unit 3 Assignment 1 Osi Model
 
Infrastructure student
Infrastructure studentInfrastructure student
Infrastructure student
 
Lesson 2
Lesson 2Lesson 2
Lesson 2
 
Disadvantages Of Robotium
Disadvantages Of RobotiumDisadvantages Of Robotium
Disadvantages Of Robotium
 
Os rtos.ppt
Os rtos.pptOs rtos.ppt
Os rtos.ppt
 
Net framework key components - By Senthil Chinnakonda
Net framework key components - By Senthil ChinnakondaNet framework key components - By Senthil Chinnakonda
Net framework key components - By Senthil Chinnakonda
 
TenAsys.Fall07
TenAsys.Fall07TenAsys.Fall07
TenAsys.Fall07
 
What's the Right Messaging Standard for the IoT?
What's the Right Messaging  Standard for the IoT?What's the Right Messaging  Standard for the IoT?
What's the Right Messaging Standard for the IoT?
 
EMBEDDED WEB TECHNOLOGY
EMBEDDED WEB TECHNOLOGYEMBEDDED WEB TECHNOLOGY
EMBEDDED WEB TECHNOLOGY
 
Embedded os
Embedded osEmbedded os
Embedded os
 
nv.ppt
nv.pptnv.ppt
nv.ppt
 
Driver Configuration Webinar
Driver Configuration WebinarDriver Configuration Webinar
Driver Configuration Webinar
 
Implementation of a Deadline Monotonic algorithm for aperiodic traffic schedu...
Implementation of a Deadline Monotonic algorithm for aperiodic traffic schedu...Implementation of a Deadline Monotonic algorithm for aperiodic traffic schedu...
Implementation of a Deadline Monotonic algorithm for aperiodic traffic schedu...
 
Analysis Of Internet Protocol ( IP ) Datagrams
Analysis Of Internet Protocol ( IP ) DatagramsAnalysis Of Internet Protocol ( IP ) Datagrams
Analysis Of Internet Protocol ( IP ) Datagrams
 
The .net remote systems
The .net remote systemsThe .net remote systems
The .net remote systems
 
Linux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerLinux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop Computer
 
Linux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop ComputerLinux-Based Data Acquisition and Processing On Palmtop Computer
Linux-Based Data Acquisition and Processing On Palmtop Computer
 

WatsonBruce Ikadega sample 1

  • 1. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 1 Copyright Ikadega, Inc. All rights reserved. Introduction to DirectPath subsystems (docstechpubsinternal_docssubsystem_intro.doc) This document contains overview information on the DirectPath™ subsystems. The information comes from the Ikadega online documentation; the online component, and not this document, will be the version that will be kept current. This document will be updated from time to time from the online documents. Information is currently available for a set of subsystems. More will follow over time. Note: DirectPath is an evolving and changing system. This document describes the future vision for each subsystem – how it is expected to look at some future point (such as when the product first ships). Many parts of the design as described in this document have not yet been implemented. Note: Underlined terms are defined in the Ikadega glossary. Document contents Internet delivery subsystem.................................................................................................2 Component life cycles.....................................................................................................4 The TV delivery and MPEG platform subsystems..............................................................6 How hospitality systems work.........................................................................................6 How ad insertion works...................................................................................................8 The jukebox model..........................................................................................................9 The interactive model (hospitality only) .......................................................................10 The volume and file access subsystems ............................................................................11 File system service layers..............................................................................................12 Typical uses of the file system ......................................................................................12 The UNIX file system ...................................................................................................13 The file access subsystem..............................................................................................13 Access to smaller, named files ..................................................................................14 Block search services for the volume access subsystem ...........................................14 The volume access subsystem.......................................................................................16 Aggregation...............................................................................................................17 Striping ......................................................................................................................18 The hardware layer........................................................................................................18 Checkpoints...................................................................................................................19 A simple example......................................................................................................19 Checkpointing states and transitions .........................................................................23 Checkpointing states ............................................................................................23 State transitions ....................................................................................................24 Replication.....................................................................................................................25 Replication and checkpointing ..................................................................................26 Content transfer engine subsystem....................................................................................27 Inside the CTE...............................................................................................................28
  • 2. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 2 Copyright Ikadega, Inc. All rights reserved. This document describes the subsystems in the groupings shown in the Introduction to DirectPath: Internet delivery Manager File access IP messagingVolume access Traffic & array control ITM Platforms: Content transfer engine (CTE) Controller event engine (CEE) Open source environment (OSE) MPEG platform (MPP) TV (MPEG) delivery Core services Application subsystems Content transfer engine extension (CTEX) See the introductory document for high-level descriptions of the subsystems. This document contains more detailed descriptions. Internet delivery subsystem The Internet delivery subsystem drives the process of sending content to Internet users. This picture shows its important components: . . . HTTP CTDs HTTP XCTDs FTP CTDs FTP XCTDs RTP CTDs RTP XCTDs Internet delivery subsystem The subsystem contains a large number of content transfer daemons (CTDs) – generally a separate one for each end-user session. (End users can have multiple sessions at the same time.) The CTDs all run the same code, but each has its own event queue and a small amount of private memory in external RAM. A CTD’s primary function is data transfer. It has a limited set of commands and functions, but this allows it to perform them very efficiently. Most of the traffic it handles is bound for clients outside the DirectPath system. Its main task is to receive data from the fabric, then place this data into outgoing message frames for the Internet. The daemon is optimized for data transfer and does only minimal processing of the data. Most CTDs have corresponding extended content transfer daemons (XCTDs). XCTDs handle the non-routine processing that the CTDs do not do – the more complex error and
  • 3. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 3 Copyright Ikadega, Inc. All rights reserved. exception processing. CTDs and XCTDs communicate with each other via either ITM or IP messaging. The CTDs run in an FPGA on an IP access node. XCTDs run in a supplemental processor, which is either on an IP access node or a supplemental processor node. The content transfer engine (CTE) is the logical platform for the CTDs. For the XCTDs, the platform is the content transfer engine extension (CTEX). FPGA Supplemental processor CTD XCTD CTE CTEX In some applications there is one XCTD per CTD, but in other applications an XCTD might oversee several CTDs. For example, in a streaming video application, one XCTD might work with the following CTDs, each of which processes a different type of information: XCTDRTSP CTD HTML CTD RTP CTD For selecting titles to download For negotiating xfer parameters -- speed, format, etc. For downloading selected titles There are several categories of CTDs and XCTDs – one category for each Internet service supported by the system (HTTP, FTP, RTP, etc.). Event engines in the content transfer engine (CTE) alert CTDs when there are events for them to process. This is the flow of control for a single CTD/XCTD pair: Event engine CTE Internet delivery subsystem CTD Dispatch signal XCTD Exceptions and complex tasks Commands and replies
  • 4. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 4 Copyright Ikadega, Inc. All rights reserved. This is the overall environment in which a single CTD and XCTD pair operates to handle one user session: CTD XCTD Web server daemons OSE OS File access session Volume access session Traffic/array control Client (e.g., end user browser) Component of the Internet delivery subsystem Ikadega-developed DirectPath object not in the Internet delivery subsystem Third-party component not developed by Ikadega External resource The Web server daemons run in the DirectPath system’s open source environment (OSE). They must have a very specific server configuration to run effectively with the rest of the system. The Web delivery client is an Internet application such as a Web browser or FTP program. In certain cases, there may also be one or more external resources for the subsystem to deal with. One example of this is a credit card validation/approval system in an e-commerce application. Depending on how complex the processing is, the interface to an external resource could be handled by either the CTD (if only simple processing is needed, such as reading cookies) or the XCTD (for more complicated processing). In the initial versions of the system, the XCTD communicates directly with the Web server daemons. In future versions, this communication may instead go through the OSE’s operating system. Component life cycles The system creates a fixed-size pool of CTDs at boot time, which it allocates one by one for each new end-user session. The number of possible CTDs generally remains fixed. If the system exhausts the CTD supply, it cannot create new end-user sessions until a CTD is de-allocated. This protects the system against denial-of-service (DOS) attacks. By limiting the number of possible sessions, the system can continue running if it receives numerous session requests, though it may be temporarily unable to allow new sessions. (System operators can set the size of the CTD pool in the Web-based configuration utility.) Since XCTDs run on a supplemental processor, which has an operating system and a richer execution environment, there is not a fixed set of XCTDs. The system creates new ones as needed.
  • 5. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 5 Copyright Ikadega, Inc. All rights reserved. This is the life cycle of a CTD/XCTD pair supporting a typical end user HTTP session (assuming a 1:1 relationship between CTDs and XCTDs): 1. When the system receives a request for a new user session, it allocates a CTD and XCTD (creating a new one as necessary). The system does handshaking with the client to determine the session type (HTTP, FTP, RTP, etc.). It configures the CTD/XCTD pair and initializes a context accordingly. 2. As described above, either the XCTD or CTD might take part in authorizing and validating the end user session. 3. Through an event engine in the content transfer engine (CTE), the CTD receives a client request to transfer a file. To the CTD, this is simply a command it is not programmed to process, so it passes it to the XCTD. 4. The XCTD receives the file transfer request and attempts to validate the transfer – checking to see if the requested file exists and if the end user is authorized to receive it, etc.. If it successfully validates the request, the XCTD generates a handle for the requested file and passes it to the CTD. Then it tells the CTD to transfer the file. 5. The CTD begins the process of requesting data and preparing it to go out to the Internet, using various services from other subsystems. At this stage, the XCTD only becomes involved if there is an error or an exception, or if processing is needed that the CTD does not know how to do. During the file transfer, the XCTD knows what file is being downloaded but does not know any details about the download, such as how much data has been sent so far. The CTD knows these details but does not know what file it is processing. 6. The CTD notifies the XCTD when the file transfer is done. 7. The previous steps repeat for each subsequent file download requested by the client. 8. When the client sends a request to end the session, the CTD passes it to the XCTD (again since the CTD is not programmed to process the request). The XCTD de- allocates itself and the CTD.
  • 6. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 6 Copyright Ikadega, Inc. All rights reserved. The TV delivery and MPEG platform subsystems The DirectPath TV delivery and MPEG platform subsystems are closely tied together. This document describes them both. These two subsystems deliver digital video to support two DirectPath applications: • Hospitality – A hospitality system provides in-room, on-demand video content to local end users. (Future hospitality systems may also support in-room Web browsing, as described later in this document.) • Ad insertion – In this application, a customer such as a cable TV provider uses a DirectPath system to insert their own advertisements or other content into a video signal sent to cable subscribers. Media server is Ikadega’s name for a DirectPath system used in either of these applications. How hospitality systems work In a typical hospitality application, one or more DirectPath media servers deliver digital movies to hotel guests. The content can also include things like short advertising videos for other nearby businesses. The following picture shows the devices involved in hospitality delivery. The only Ikadega-supplied component is the media server. The customer supplies and manages the rest. Media server (DirectPath system) Facility cable plant TV set End user's room End user agent
  • 7. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 7 Copyright Ikadega, Inc. All rights reserved. To watch a movie, the user interacts with the customer’s end user agent system rather than with DirectPath. This is the normal sequence of events when an end user wants to use hospitality services: 1. The end user, on an in-room TV, makes a request to watch a movie or other program (via an input device such as a remote control). 2. The customer’s end user agent receives this request and queries the media server for information on the available content. 3. The media server sends the end user agent data on all the selections available, including the title, running time, description, rating, etc. for each content file. 4. The end user agent takes this information to display menus and help the end user make a selection. The agent also makes any necessary billing arrangements. 5. When the guest makes a selection, the end user agent directs the media server to begin playback of the requested program to a specific media server port. The end user agent tunes the user’s TV to the correct channel to receive the program. This channel change is invisible to the end user – it does not change the channel number displayed on the TV. 6. The media server plays the selection as requested. It sends the signal directly to the end user’s TV through the building’s cable plant. The user may pause or halt playback at any time. The only involvement the end user agent has during this phase is to pass any pause/restart/stop commands to the media server. 7. The media server notifies the end user agent when playback is done. Communication from the agent goes through an end user agent proxy. This is an application that runs in the open source environment. It handles communication between the end user agent and the DirectPath controller (DPC). The DPC takes action as appropriate, which often affects the TV access node. External system Proxy task DPC TV access node Media server OSE
  • 8. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 8 Copyright Ikadega, Inc. All rights reserved. How ad insertion works Ad insertion allows a local cable company to substitute its own commercials (usually for local businesses) for those in the input broadcast (which are often made for a national audience). This picture shows the major components involved: Media server (DirectPath system) Ad scheduler Content loader Cable TV head end A/B switch "Go" command Signal New ads Ad source Subscribers Numerous channels of content arrive at the cable TV head end. If there is no ad insertion happening for a particular channel, that channel’s signal passes unchanged through the A/B switch and on to the subscribers watching that channel. However, when the head end receives notification that a commercial is about to start, it signals the ad scheduler system. The ad scheduler has a list of the commercials stored on the media server. It decides whether to replace the national ad with one of these commercials, and then it chooses the commercial to run. The scheduler sends a command to the media server to play that ad on the specified channel. The ad scheduler also uses a proxy task to communicate with the media server. The media server immediately begins to play the commercial. The A/B switch replaces the signal coming from the head end with the media server’s output signal. The subscribers watching that channel see the ad being played by the media server. From time to time, the content loader receives new digital ads. It passes them on to the media server for storage, and it also notifies the ad scheduler, so the scheduler has a current list of which commercials are stored in the media server. The ad scheduler and content loader can run on the same machine or different machines.
  • 9. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 9 Copyright Ikadega, Inc. All rights reserved. The jukebox model The early versions of the media server are designed around a jukebox model, in which it plays the selection it’s told to by an outside system (the end user agent or ad scheduler). A later section of this document describes the interactive model, to be implemented sometime in the future. Whether it’s playing movies or inserting ads, the DirectPath system has hardware and software in its TV access nodes to support video playback: Sub-node 0 MPEG drivers OS-9 DAVID Application MPEG decoder (and related components) Sub-node 1 Microprocessor Node-fabric interface TV signal Control data MPEG image stream Ikadega component Third-party component TV access node There are eight sub-nodes on a TV access node, each of which produces one video signal. Notice that one node-fabric interface (NFIF) handles fabric communication for all of them. Most of the data arriving at the NFIF is the video data, which it passes directly to the appropriate MPEG decoder rather than to the microprocessor. (This is similar in philosophy to how DirectPath storage nodes pass data directly to access nodes without going through the DirectPath controller.) The data going from the microprocessor to the node-fabric interface includes requests for more content data from the storage nodes. The Ikadega application running in the microprocessor would work on tasks such as closed captioning, providing visuals to accompany audio-only content, and superimposing text or graphics over the video for weather warnings, logos and other images. The MPEG decoder’s “related components” from the previous drawing include logic to support internal MPEG transport, superimposing, and audio-video mixing. Subsystem information: the Ikadega microprocessor application and end user agent proxy are part of the TV delivery subsystem, while the other sub-node components are in the MPEG platform subsystem.
  • 10. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 10 Copyright Ikadega, Inc. All rights reserved. The interactive model (hospitality only) In future versions of hospitality systems, the user will interact with a Web browser running in the media server. This provides a more appealing and functional selection system that that provided by the original end user agent, which tends to be character- oriented. The design might look like this: Sub-node 0 Drivers OS-9 DAVID Browser MPEG decoder (and related components) Sub-node 1 Microprocessor Node-fabric interface TV signal Control data MPEG image stream Ikadega component Third-party component Applet Created by VAR These are the major differences in the interactive model: • End users will be able to go on the Internet from their rooms. • End users who would rather watch a movie than go on the Internet will select content via an applet running in a browser in the microprocessor. The customer or VAR will probably create this applet. • Since the end user will use the browser to make content selections, the end user agent has a reduced role – it simply passes keystrokes between the end user and the browser. • There is no interactive model for ad insertion. Subsystem information: the applet and end user agent proxy are in the TV delivery subsystem, while the other sub-node components are in the MPEG platform subsystem.
  • 11. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 11 Copyright Ikadega, Inc. All rights reserved. The volume and file access subsystems The volume access subsystem and file access subsystem are the DirectPath file system. These subsystems support the reading and writing of data on storage node hard drives. The file system can accommodate a wide range of uses. In some applications, such as hospitality systems that primarily play back movies to locally connected TV sets, the file system holds a relatively small number (in the range of hundreds) of very large files. These files do not change very frequently, and owners load new files relatively infrequently (say on a daily or weekly basis). Other customers, however, will use DirectPath to host and deliver Web sites. These customers need a file system that can handle large numbers (in the tens of thousands) of small files that change relatively frequently. Between these two extremes are customers like an online music service, who must deliver one set of small files (say the Web pages where users select songs to download) and another set of fairly large ones (the actual MP3 song files). The DirectPath file system has flexibility to accommodate these varying uses in one design. The file system consists of two DirectPath subsystems: • Volume access subsystem – In DirectPath, a volume is a logically continuous set of disk sectors. The volume access subsystem is unaware that some volumes contain multiple files. • File access subsystem – A DirectPath file is a named portion of a volume. File accesses go through the volume access subsystem. Notice from this drawing that all disk accesses go through the volume access subsystem, either directly or through the file access subsystem: Client task Volume access subsystem File access subsystem Accessing a file Accessing a volume
  • 12. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 12 Copyright Ikadega, Inc. All rights reserved. File system service layers You can think of the DirectPath file system as a collection of services divided into the following layers: Implemented here: aggregation, replication, striping, checkpoints. Used here: block search services from the file layer. Applications Request and work with the data File subsys. Identifies & manages named data files Volume subsystem Locates & places the data on disk Hardware layer Reads and writes the data UNIX file system Implemented here: block search services for the volume layer. Used here: checkpoints. The remaining introductory pages describe these components, from the higher-level directory layer and UNIX file system to the low-level hardware layer. Typical uses of the file system DirectPath can actually support multiple file systems running concurrently. The Ikadega- supplied file system can run together with the UNIX file system. It can also exist in the same machine as an optional customer-defined file system. Below are some examples of how customers could used the Ikadega-supplied file systems: • Local large content delivery, where the system delivers very large files to nearby users – for example, movies to hotel guests. In this scheme, there usually is only one company providing the content. Since the data for a content file is not likely to change, the content rarely if ever goes through different versions. What does change over time is the set of movies available – new ones are added and older ones might be removed. The volume access subsystem provides the services for this type of use. In this type of system, the file access subsystem exists but is essentially empty – it just passes I/O requests to the volume access subsystem with little or no processing.
  • 13. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 13 Copyright Ikadega, Inc. All rights reserved. • Internet delivery, where the system hosts numerous Web sites containing various file types, from small files to movies. The content on these sites comes from a number of content providers, and from time to time the system owner may need to find out who created a certain file. While some of these files may be as large as the movies described above, there are probably also a number of small files. The system must be able to locate and process all of these files. It must also be able to deal with them being replaced frequently. In applications like this, the system relies on the services of the volume and file access subsystems. The file access subsystem processes these files. • UNIX file access, described in the next section. Most DirectPath systems have a mixture of these file types. The UNIX file system A complete UNIX file system may exist in DirectPath to support legacy applications and other programs that need UNIX services. One example of this is the system event logging, which could be implemented by using UNIX logging services. Also, the Manager subsystem, designed to be as UNIX-like as possible, uses UNIX services. The UNIX file system can perform reads and writes on its own private disk (a disk invisible to the other file systems), or it can use the volume access subsystem for disk access, or it can do both. The private disk is currently used in system booting. When it uses the volume access subsystem for disk access, the UNIX file system has its own volume, which it thinks is an entire disk. It isn’t aware that the volume access subsystem is even there. Possible future directions: In media server applications that mainly deliver digital video, it may be possible to use the UNIX file system as the only file service, without using the volume or file access subsystem. (The file access subsystem doesn’t do much in these applications anyway). It’s also possible, though, that the file access subsystem might take over all the functions of the UNIX file system in future versions of the system. The file access subsystem The file access subsystem is the highest layer in the file system hierarchy. The nature of its processing depends on the type of data being processed. The subsystem is mostly transparent when processing large content files such as digital movies or music – the volume access subsystem does most of the work on these files. The file access subsystem becomes important when the system works with numerous, small content files. For example, if a customer uses a DirectPath system for hosting Web sites, each site will have a number of relatively small files, and the file access subsystem would process the individual files in the site (HTML, GIF, etc.). One key feature of the DirectPath file system is that virtually all of the content transfer from disk happens in the volume access subsystem rather than the file access subsystem. This gives the system a speed advantage over traditional file servers.
  • 14. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 14 Copyright Ikadega, Inc. All rights reserved. Access to smaller, named files The volume access subsystem sees large blocks of data with no internal structure – for example, digital movie files that are delivered to users from beginning to end. The file access subsystem gives the system access to many smaller named files, such as the files that make up a Web site. f1 f2 f3 f4 f5 . . . fn Where the volume access subsystem sees one large volume... ...the file access subsystem might see a number of smaller named files. Block search services for the volume access subsystem To find files, the file system has several different directories: • Inode directory – an inode is a system data structure that describes a file. An application references a file by giving the file system an inode number. • URL directory – this is a table that maps URLs to inodes, in effect providing URL “names” for the inodes. • Traditional file system directory – another inode mapping table, but one that mimics the hierarchical tree structure of subdirectories and files commonly used in PCs and UNIX machines. These directories also point to (and “name”) inodes. There are three basic methods for reading content, depending on the nature of the files involved: • Locate method – the client task wants to read from a certain offset into a volume, which it has a handle to. This method is for large files. Here is a typical sequence of events: File system session 1 2 3 4 5 6 7 8 9 Volume system session Storage node Client 10 11
  • 15. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 15 Copyright Ikadega, Inc. All rights reserved. 1: Open request (client sends either a file name or URL). 2: Open reply (returns a handle to the file, if found). 3: Locate request. 4: Locate reply (returns a map of the file’s block segments). 5: Volume read request. 6: Sector read request. 7: Data transfer to client buffer (an RDMA transfer). 8 & 9: Request replies. 10: Close request. 11: Close reply. Steps 5 through 9 repeat until the client has received the entire file. • Whole file method – for quick access to files small enough to be fully retrieved in one read operation (such as Web site files). One benefit of this method is that there are no file open or close operations. File system session 1 7 2 5 4 3 6 Volume system session Storage node Client 1: Read file request (client sends either a file name or URL). 2: File system session passes read request along. 3: Sector read request. 4: Data transfer to client buffer (an RDMA transfer). 5, 6, 7: Request replies. • Traditional method (with Ikadega enhancements) – this method supports file reads as done on a UNIX system. The method also supports traditional file operations such as renaming, setting permissions, etc., and it supports DAFS. File system session 1 2 4 56 3 8 Volume system session Storage node Client 9 10 11 7 1: File open request. 2:Open request reply. 3: File read request. 4: Read request passed along. 5: Sector read request. 6: Data transfer to client buffer (an RDMA transfer). 7, 8, 9: Request replies. 10: File close request. 11: File close reply.
  • 16. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 16 Copyright Ikadega, Inc. All rights reserved. The “Ikadega enhancements” mentioned above include the direct RDMA content transfer from the storage node to the client. Traditional file systems would send the content to the client through the volume and file system sessions. For added flexibility, clients may shift between the locate and traditional methods with the same file handle. Note: These drawings assume that the directories are fully cached in memory and that files are stored contiguously on disk. The volume access subsystem The volume access subsystem supports the file access subsystem and UNIX file system. Volumes are logical collections of sectors, often organized into categories of content stored on the system, such as the top N most popular titles and the other less-popular files. Most volumes contain large content files sized from the hundreds of megabytes to gigabytes and beyond. Volumes generally have fewer attributes than files – they do not have items such as access permission data, modification and access dates, checkpoint information, etc. The volume access subsystem is where you first start to see disk organization. A disk has one or more disk slices, each of which contains partitions. Partitions cannot cross disk slice boundaries. There also is a partition descriptor for each partition in a slice. Disks Partitions Disk slice boundaries Partition descriptors Disk slices help support the work of offline utility applications. One example of such an application is a program that pre-loads content before disks are shipped. The system formats disk slices like conventional operating system partitions. Note to readers who are familiar with the system’s traffic shaping components: You can think of the volume access subsystem as a part of the components responsible for storage array control and fabric traffic.
  • 17. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 17 Copyright Ikadega, Inc. All rights reserved. Aggregation One method for splitting volumes into partitions is aggregation. This is simply breaking the content into partitions, which can reside on different disks or storage nodes. Original volume Disk 2 Disk 8 100 gigabytes 40 GB 60 GB With checkpointing or replication, the aggregation can have different boundaries for each version: 40 GB 60 GB 40 GB30 GB 30 GB
  • 18. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 18 Copyright Ikadega, Inc. All rights reserved. Striping Striping is another way the system splits content files into partitions. Striping is a disk storage technique that helps to protect against lost content. It splits up a content file into equal-length blocks called stripes. The system stores these stripes on N different storage nodes (here N = 4), along with an additional stripe described below: 0 1 2 3 4 (^ = exclusive OR) 5 6 7 0 1 2 3 4 5 6 7 Storage node 1 Storage node 2 Storage node 3 Storage node 4 Original volume contents 0^1^2^3 4^5^6^7 Storage node 5 Parity stripe Each byte in the parity stripe (at the bottom of the drawing) is the result of an exclusive OR logic operation on the bytes in the corresponding stripes. For example, the first byte of the parity stripe is the result of an exclusive OR performed on the first bytes of stripes 0 through 3. If the system can’t read one of the stripes (say if there is a disk or storage node error), it can re-create the lost data by comparing the values in the parity stripe with those in the remaining stripes. The hardware layer This layer contains the hard disks and their controlling hardware and software, all located on storage nodes. The layer doesn’t know the meaning of the data it reads and writes. It just responds to specific commands. Most of the work it does is read operations, but it does write to disk as well, to load new content or make copies of volumes. Every storage node has multiple sub-nodes (two of them at present), each of which controls one ATA-type hard disk drive. The sub-nodes have custom disk-controlling hardware as well as interfaces to the fabric. Since the whole DirectPath system is designed to keep the disk drives as busy as possible, the system makes heavy demands on the drives, and there is very little room for malfunctions or disk errors. Field service people can replace disks “on the fly” (while the rest of the system continues to run and deliver content) to remove a faulty disk, install a drive with more capacity, or insert a disk pre-filled with new content.
  • 19. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 19 Copyright Ikadega, Inc. All rights reserved. To assist the disk activity scheduler, DirectPath maintains a set of performance history data on each disk. This data reflects the actual performance of each individual drive (rather than the specifications for the drive type). Checkpoints Checkpoints allow the system to keep multiple file versions on disk, mostly to ensure read consistency – each end user getting files from the same file set. This is useful in many applications, especially with frequently updated files such as Web files. If a site is popular, there might be a number of people using it when it’s time for one of the file updates Web sites often have. If a site changes frequently, at some point there will be users with files open from several revisions ago, especially for users with slow Internet connections. With checkpointing, the new and older versions of the site co-exist while the system loads new files to the disk drives. The checkpoint feature is implemented in the volume access subsystem, though it would only be used on the files processed by the file access subsystem. The files for a new checkpoint become available to users when the file system commits them. A commit operation updates the partition descriptors for the volume. New users aren’t able to use the new checkpoint until all of the new files are committed successfully. Users that had site files open at the start of the update see only files from the most current checkpoint when they started their sessions. If the system halts during the loading stage (before it can commit a new checkpoint), the checkpoint and its new content are lost. The system retains the previous committed checkpoints, though. The DirectPath customer can specify how many checkpoints to keep for each volume. The system generally re-uses the storage space of expired versions. This can be a rapid process – on some systems that change content very quickly, the resources for a replaced checkpoint may be re-used in as little as 4 minutes. The checkpoint feature is implemented in the volume access subsystem, but DirectPath only uses it on the smaller named files processed by the file access subsystem. At any given moment, a checkpointed volume has files from 0 to n checkpoints available, and it may also have a new checkpoint in progress. A simple example To understand checkpointing, take an example volume that only has five files. (The example is small and unrealistic, but it demonstrates the basics of how checkpointing works.) Suppose there is a DirectPath system that hosts Web sites, and it receives five files for the initial version of a site. The files are a.html, b.html, c.html, d.gif, and e.gif.
  • 20. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 20 Copyright Ikadega, Inc. All rights reserved. When the new files arrive, if they are to go in a new volume, the application software allocates a certain amount of space for the volume. At this point there is officially nothing in the volume – none of the blocks is committed, though the file system may have started loading the files to disk. . . . Data being stored to disk At this stage, the files for the example Web site are present on their way to disk, but they are not yet available to users. At this stage, the files for the Web site are present on their way to disk, but they are not yet available to users. When the files are stored successfully, the file system can commit the checkpoint. After it does this, new users open the files from this first checkpoint: a.html (1..latest) b.html (1..latest) c.html (1..latest) d.gif (1..latest) e.gif (1..latest) .. . Most recently committed: 1 Oldest retained checkpoint: 1 The (m..n) notation indicates the checkpoints each file belongs to. (The system does not store this information with the files, however – it maintains it in memory only.) Oldest retained checkpoint and Most recently committed are two variables the system uses to keep track of a volume’s checkpoints.
  • 21. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 21 Copyright Ikadega, Inc. All rights reserved. Now, suppose sometime later there is a change to the a.html file, where a completely new version of the file replaces the first version. The system stores the new version in the first available free space, and then commits it: a.html (1) b.html (1..latest) c.html (1..latest) d.gif (1..latest) e.gif (1..latest) a.html (2..latest) .. . Invalidated New Most recently committed: 2 Oldest retained: 1 At this point, the volume contains files from checkpoints 1 and 2. The commit invalidates the first version of a.html, which means that the file is still there but is no longer in the latest checkpoint. However, the file is still valid for sessions using checkpoint 1, if any. If certain conditions are met later (see below), the file system could eventually re-allocate the storage used by this first version of a.html. The users are not aware of the checkpointing or the different file versions.
  • 22. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 22 Copyright Ikadega, Inc. All rights reserved. Suppose now that there two file changes for the next checkpoint. The site owner changes the b.html file, which had been the only file referencing d.gif. The new b.html no longer uses the graphic file. In addition to invalidating the old b.html, the system invalidates d.gif – the file does not apply to the new checkpoint or the ones that follow it (unless one of the HTML files is changed to refer to it again). d.gif and the original b.html are still valid for users of the checkpoints 1 and 2. Here’s what the volume looks like after the commit: a.html (1) b.html (1..2) c.html (1..latest) d.gif (1..2) e.gif (1..latest) a.html (2..latest) b.html (3..latest) .. . Most recently committed: 3 Oldest retained: 1 What eventually happens to the previous version of b.html, d.gif, and other invalidated files depends on the customer’s file allocation policies. The system’s operators probably want to keep files from at least some of the previous checkpoints, in which case these files would remain there unchanged. However, to keep disk clutter down, most customers also want to limit the number of checkpoints remaining on disk. So if a customer chooses a checkpoint limit, it affects what the file system does when there is a new checkpoint. If the following conditions are both true, then the file system could mark a file reclaimable (discarded and available for re-use by new files): • If there are currently no sessions using the checkpoint in question, and • If the file’s checkpoint number is older than the new checkpoint’s number minus the checkpoint limit (for example, with a limit of 5 and committing checkpoint 8, if a file is from checkpoints 3, 2, or 1) The file system marks a file as reclaimable only if both conditions are true for it. If the file system marks a file as reclaimable, its blocks no longer contain valid data, though the system might not re-use them right away.
  • 23. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 23 Copyright Ikadega, Inc. All rights reserved. Checkpointing states and transitions This document describes some of the data the DirectPath file system uses to process checkpoints. It also shows how these variables change in response to various checkpoint events. Note: The information in this document applies only to customer environments with relatively small numbers of content providers. This document does not apply to environments where there are numerous content providers. The file system maintains the following variables to support checkpointing, which are maintained by the volume access subsystem and file access subsystem: • For each volume: o OldestRetained – this is the number of the earliest checkpoint the system must still honor (retain the files for). Recall from About checkpoints that some end users could still be using files from one or more previous checkpoints. o LatestCommitted – the number of the most recently committed checkpoint for a volume. o OldestBeingUsed – the oldest checkpoint that still has active user sessions. • For each file: o Modified – this is the number of the checkpoint containing the latest version of a particular file. o Invalidated – the checkpoint when the file was removed from the latest checkpoint, though it still may be in use by active user sessions. Note: The DirectPath file system currently tracks these two variables for each file. It could in the future track them by block instead. Checkpointing states The following table shows the different states a file can be in. Note, though, that the DPFS does not store these file states anywhere. The state of each file is implied by the values of the above variables. The reason for this is system performance and consistency – if a checkpoint that affects, say, 200 files aborts before committing, the system would be slowed down by first marking and then un-marking all 200 files. Also, if the system halts in the middle of a 200-file update, the file system would not be able to correctly tell which files are members of each checkpoint when it comes back up.
  • 24. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 24 Copyright Ikadega, Inc. All rights reserved. There are abbreviations in the tables: F for file and V for volume. For example, F.Modified means the Modified variable for a file, and V.Committed is the Committed value for a volume. Implied state Values causing that state Comment Free F.Invalidated <= V.OldestRetained When the DirectPath file system wants to create new files for a checkpoint, it first allocates free space for them. Newly allocated F.Modified > V.LatestCommitted; F.Invalidated == NULL This is the status of a new file being created for a future (not yet committed) checkpoint. If the file system commits the checkpoint, the file’s state becomes In-use retained. If the system instead aborts the checkpoint, it makes the file’s resources free again for re-use. In-use retained F.Modified <= V.LatestCommitted; F.Invalidated == NULL This is a normal file state – it means that the file is part of the volume’s most recently committed checkpoint. Invalidation pending F.Invalidated > V.LatestCommitted In this state, the file system is in the process of creating a checkpoint that, when committed, will invalidate the file. If the commit operation completes, the status changes to Retained. If the system does not commit the new checkpoint, it eventually returns the file to the In-use retained state (sometime before it commits the next checkpoint). Retained F.Invalidated > V.OldestRetained; F.Invalidated <= V.LatestCommitted A file in this state has been invalidated. It is no longer in the volume’s latest checkpoint, but it is still part of an older checkpoint that has current users. When all of this checkpoint’s users end their sessions, the system could delete the file (put it in the Free state, in a sense) and re-use its resources, depending on its checkpoint retention policy. State transitions This drawing shows the checkpoint states a typical file goes through during its life span: V.OldestRetained or Is F.Invalidated < the smaller of: ? Newly allocated In-use retained commit Newly invalidated commit Too old to retain? RetainedNo Yes Free commit V.OldestBeingUsed
  • 25. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 25 Copyright Ikadega, Inc. All rights reserved. The following table shows how the system updates variables and changes implied states for individual files in response to miscellaneous checkpoint events. This table describes how the system updates variables and changes states in response to different checkpoint events. From state To state Triggering event Variables modified Free Newly allocated File allocation F.Modified := V.LatestCommitted + 1 Newly allocated Free File delete F.Invalidated := V.LatestCommitted + 1 Newly allocated Free Checkpoint abort F.Modified := NULL Newly allocated In-use retained Commit Increment V.LatestCommitted In-use retained Retained Commit, when V.OldestRetained is still < F.Invalidated Increment V.LatestCommitted In-use retained Free Commit, when V.OldestRetained is now >= F.Invalidated Increment V.LatestCommitted Retained Free Commit, when V.OldestRetained is now >= F.Invalidated Increment V.LatestCommitted Invalidation pending In-use retained Checkpoint abort F.Invalidated := NULL Replication Replication is a disk storage method for protecting against lost content by making complete copies of volumes on different storage nodes. This is useful when one copy of a content file is not enough to meet the peak demand for the file. It also improves the system’s peak throughput. System managers can use replication to place copies of popular content near the physical outer edges of disk drives, where data is read faster (since more bytes pass under the read/write head in the same amount of time than near the middle). Replication strategy is how a customer wants to replicate volumes. Replication of content is always a relatively low-priority activity, not important enough to interfere with sending out content. As a result, there is lagged replication – the system usually finishes the copy operation somewhat after finishing the storage of the original content.
  • 26. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 26 Copyright Ikadega, Inc. All rights reserved. Replication and checkpointing On systems using checkpointing, lagged replication follows behind the commitment of each checkpoint. For example, when the system originally allocates space for a volume, the replicated copy doesn’t exist yet: Original volume Copy As the original volume grows and changes, the copy might lag behind like this: CPs included: 1 CPs: 1, 2 CPs: 1 CPs: 1, 2, 3 CPs: 1, 2 At this point, if it needed data from checkpoints 1 or 2, the file system could get it from either the original or the copy (assuming that the file in question has been invalidated in either of these checkpoints).
  • 27. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 27 Copyright Ikadega, Inc. All rights reserved. Content transfer engine subsystem A content transfer engine is a platform that supports a number of content transfer daemons (CTDs). In the initial implementation, it works with CTDs in the Internet delivery subsystem. Together the two subsystems send data to Internet users. A supplemental processor initializes and oversees the CTE. Sometimes these two components are on the same access node: IP access node Content transfer engine Supplemental processor FabricInternet Here the supplemental processor is on a separate supplemental processor node: Supplemental processor Supplemental processor node Content transfer engine IP access node Fabric Internet The CTE is implemented as a set of fixed-function engines and re-programmable micro- engines, all on a field-programmable gate array (FPGA). The engine’s major functions are buffer management and two communication interfaces going to the fabric and the Internet.
  • 28. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 28 Copyright Ikadega, Inc. All rights reserved. Inside the CTE This drawing shows the CTE’s internal structure: Node-fabric interface engine EventQEventQ Event engine External network interface engine EventQ Fabric External network Buffer & memory control External RAM Content Transfer Engine There are multiple event engines – more than one, but perhaps only a few. Each event engine has its own queue. The event engines receive notices about events to be processed by particular CTDs. The event engine dispatches an event by waking up the correct CTD to process it: EventQ n Event engine n EventQ n-1 Event engine n-1 . . . Content Transfer Engine subsystem Internet delivery subsystem CTDCTD CTD CTD Dispatch signal The CTE assigns an arriving event to an event engine in this way: • If the CTE currently has no other pending events for the particular CTD, the event goes to a randomly chosen event engine. • However, if the event queues already contain at least one event for that CTD, then all events for that CTD must go through the same event engine, until they are all dispatched. One of the critical issues for the content transfer engine is memory management. The CTE makes a minimum of memory transfers as it receives data from storage nodes and sends it to the external network. It does this by initially storing data from the access nodes into external RAM buffers, then sending it to the Internet from those same buffers. A supplemental processor runs a component called the content transfer engine extension (CTEX). CTEX runs extended content transfer daemons (XCTDs), which extend the
  • 29. Introduction to DirectPath subsystems PROPRIETARY and CONFIDENTIAL. NDA REQUIRED. 8/14/2001 10:54 AM Page 29 Copyright Ikadega, Inc. All rights reserved. content processing functions beyond the CTE’s limited processing scope. For example, XCTDs have more flexible access to shared data than CTDs do.