BLOBs and relational database data are very different entities. Relational data usually consists of text or numbers and tends to be small. In contrast, BLOB data is most often pictures in .jpg, .tiff, or .bmp format—such as product images on a Web site—which can be quite large.
There are three (3) approaches to storing unstructured data with SQL Server, RBS, SQL BLOB, and FILESTREAM.Remote BLOB Storage (RBS), SharePoint relies on a new layer in SQL Server to read or update BLOB data stored outside of the database on separate BLOB Stores (File System or dedicated BLOB Stored)SQL BLOB traditional BLOB storage with SharePoint, BLOB data is stored along side the structured metadata in the Content DatabaseFILESTREAM Storage: Improving SQL BLOBGeneral guidance can be summarized as:Storing records that are on average smaller than 256KB is optimized with traditional SQL BLOBStoring large BLOB data like large video files will benefit from FILESTREAM - FILESTREAM will provide better streaming performance on these filesStoring the TIFF images or large static files on remote BLOBs, you should consider to leveraging the new Remote BLOB API layer, i.e. RBS
varbinary is the binary data type designation for binary large objects stored in SharePoint 2010 Content Databases and refers to variable-length binary data. (MAX) refers to a value that max indicates that the maximum storage size is 2^31-1 bytes or otherwise 2GB.
RBS is a SQL Server 2008 add-on that uses auxiliary tables, stored procedures and a managed client library to provide its services. A reference to the blob (provided by the Blob Store) is stored in RBS auxiliary tables and an RBS Blob ID is generated.ISVs create RBS Provider Libraries to enable custom BLOB stores using the RBS API set. The externalization of binary large objects in SharePoint 2010 is often referred to as RBS.
FILESTREAM storage is implemented as a *varbinary(max) column in which the data is stored as BLOBs in the file system.Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.A FILESTREAM filegroup is a special filegroup that contains file system directories instead of the files themselves. These file system directories are called data containers. Data containers are the interface between Database Engine storage and file system storage.FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.Performance: The way the data is going to be used is a critical factor. If streaming access is needed, storing the data inside a SQL Server database may be slower than storing it externally in a location such as the NTFS file system. Using file system storage, the data is read from the file and passed to the client application (either directly or with additional buffering). When the BLOB is stored in a SQL Server database, the data must first be read into SQL Server’s memory (the buffer pool) and then passed back out through a client connection to the client application. Not only does this mean the data goes through an extra processing step, it also means that SQL Server’s memory is unnecessarily “polluted” with BLOB data, which can cause further performance problems for SQL Server operations.FILESTREAM is most appropriate when the following conditions are true:Objects that are being stored are, on average, larger than 1 MB.Fast read access is important.http://technet.microsoft.com/en-us/library/bb933993.aspxhttp://msdn.microsoft.com/en-us/library/cc949109(SQL.100).aspx*See slide
Storage AttributeA FILESTREAM BLOB is a SQL BLOB, varbinary(MAX) Dual Programming ModelAll the existing code you have on BLOBs can work for a filestream BLOB.We leverage the Windows file systems layer to enable the win32 rich streaming programming model with the same transactional semantics as T-SQL. You open can obtain a real win32 handle to the filestream BLOB and call win32 read/write APIs on it.This is main advantage over SQL BLOB. With your large BLOB data stored now on the filesystem, the streaming performance can be as good as the filesystem’s. You can leverage the file system streaming optimizations options that fit your app’s access patterns.Data ConsistencySQL Server is now on the I/O stack of the streaming that you do on the filestream unstructured data Integrated ManageabilityThe filestream BLOB is still a SQL BLOB; FTS, Indexing, Replication, Log shipping etc.. Works for filestream BLOBs.
BLOBs smaller than 256 kilobytes SQL BLOB suitable, and BLOBs larger than 1 megabyte (MB) are best stored outside the database. For those sized between 256 KB and 1 MB, the more efficient storage solution depends on the read vs. write ratio of the data, and the rate of “overwrite”. Storing BLOB data solely within the database (e.g., using the varbinary(max) data type) is limited to 2 gigabytes (GB) per BLOB and enforced through SharePoint 2010.
Write Once, Read Many (alternatively Write One, Read Multiple or WORM) refers to computer data storage systems, data storage devices, and data storage media that can be written to once, but read from multiple times. DVD/CDRLook through the application tables and find blobs that are no longer referenced by the application.GC time window to define when BLOBs are deleted from the blob store.blobs are "orphans" and can be caused due to aborted transactions, application misbehavior or other failures. Orphan blobs created before the GC time window are deleted from the blob store.Get more info here http://blogs.msdn.com/b/sqlrbs/archive/2008/08/08/rbs-garbage-collection-settings-and-rationale.aspx.
Enterprises have 3 options. Each option has advantages & disadvantages based on your scenario.What are the main tradeoffs? When I choose to store my Jpeg images or video files on Windows file system/file server configuration, I get great streaming performance. But my development cost is high because SharePoint needs to have the logic to keep the structured data in sync with my Jpegs/MP3 files stored outside the db.When I choose to store my Jpeg images or video files on a dedicated Blob store, I can scale and expand my storage space easily. The more I expand it the cheaper it gets to add to it. In this case streaming performance depends on your store and keeping structured and unstructured data in sync requires a complex implementation at the app level. When I choose to store my Jpeg images as SQL VARBINAY(MAX) BLOBs in SQL Database, my Jpeg files are constantly in sync with the metadata. I can also use all the SQL Server database management features on my Jpeg files. What’s the tradeoff in this case? SharePoint has to manage the file size limitation and poor streaming performance if my jpegs grow larger (a paper by Jim Grey points out the SQL BLOBs streaming performance is better than file system streaming perf for files smaller than 250KB, while when the unstructured data is around 1MB and larger the File system streaming performs much better than SQL BLOB Streaming).I’ve also seen large enterprises, designing more than one of these options in the same app and some central module decides in which option to store the new BLOB depending on the storage purpose (how large the BLOB is, how it will be used)Show of handsWho store some or all BLOB data in SQL Server BLOBs (Varbinary(max))?Who store some of the BLOB data on file system?Who store some of the BLOB data on remote dedicated store?
The Office Web Applications cache is used by Word and PowerPoint Web Applications to create a version of a document requested for viewing through the browser improving performance and reducing resource consumption on server machines by making cached versions of a document or presentation available in cases where there are multiple requests for the same document.The Office Web Applications cache occurs in two (2) distinct tiers, on the server file system and within a “specialized” site collection hosted on a per Web application basis. Document or presentation requests made through the Office Web Applications are served through both caches as the images are rendered for client consumption. Both cache locations are used by all site collections within a Web application where the Office Web Applications features activated.For example, when a client requests a document, the document is rendered through an AppServerHost.exe process and the document subsequently is cached to the server file system cache located on each server from which the content is propagated to the site collection cache which exists on a per Web application basis. Subsequent requests for the document are rendered from the site collection cache.
The maintenance tasks associated with Remote BLOB Storage (RBS) are mainly performed through the RBS Maintainer. The RBS Maintainer performs periodic garbage collection and other maintenance tasks for an RBS deployment.GCGarbage collection is how unreferenced or deleted data is removed from the remote BLOB store. Garbage collection in RBS is performed passively. References to BLOBs are counted by looking at the list of BLOB IDs stored by the application in its RBS table columns at the time of garbage collection.RSLook through the application tables and find blobs that are no longer referenced by the application. The list of registered RBS columns is used for this purpose. BlobIds must not be stored in any place other than the registered columns. The blobs that are no longer referenced by the application are marked to be deleted.DPBlobs marked for deletion are actually deleted from the blob store. There is a gap between when the blobs get marked for deletion and when they are actually deleted. This gap duration can be configured using the "garbage_collection_time_window" config item and defaults to 30 days.
Remote Blog Storage (RBS) Best Practices in SharePoint 2010 - EPC Group
Overview Understanding Unstructured Data Storage ◦ SQL BLOB ◦ RBS ◦ FILESTREAM RBS Structure and Mechanics Planning BLOB Storage Deployment and Migration Support EBS Summary
Remote BLOB Storage is designed to delineate structured (metadata) and unstructured (BLOB data) data Enables organizations to deploy more efficient content storage models based on commodity storage ◦ Does not address capacity – database is the sum of the unstructured and structured data regardless of location Provides an upgrade path for WIDE customers Improves existing BLOB storage scenarios (EBS)
On average 20% of data is structured, 80% is unstructured or semi-structured
Does not adhere to specific format or sequence Is not tied to rules and unpredictable Examples: ◦ Text ◦ Video ◦ Audio ◦ Images ◦ Word, PowerPoint, code, etc.
Organized in semantic chunks (entities) Tied to relationships and has attributes Associated with a defined schema: ◦ All entities have the defined format ◦ Have a predefined length
BLOB Binary Large OBject BLOB is the data stream associated with a file ◦ SharePoint file metadata and BLOBs are stored in SQL databases ◦ BLOBs do not participate in query operations ◦ Sample BLOB operations: Get, Put, Read range, etc. SharePoint is built around the file ◦ Document libraries, Record Centers BLOBs generally represent 80% of total content
Web Server Database Server Content Database SharePoint stores BLOBs and associated metadata in the content database
Storage ◦ SQL storage is usually more expensive SAN versus CAS stores Performance ◦ Impacts load on SQL Server box Policy requirements ◦ Expunge, BLOB immutability
Web Server Database Server Content Database BLOB BLOB BLOB Store X Store Y Store Z• Independent component that can be registered for a SharePoint farm• Data store for all BLOBs added to the content databases where the Provider is set Active• Customers can select BLOB store providers
Unstructured Data Unstructured Data Unstructured DataDedicated BLOB Stores SQL BLOB Integrated File + Remote File Servers Database
Binary large objects stored in data tables (varbinary(MAX)) Traditional method of storing and retrieving binary large objects with SharePoint Products and Technologies
SQL Server 2008 Add-on Remote Blob Storage (RBS) is a library API set that is designed to move storage of large binary data (BLOBs) from Microsoft SQL Server to external storage solutions. RBS gives applications the ability to use rich relational capabilities of SQL Server for their structured data along with capabilities of dedicated storage solutions for their unstructured data in a transactionally consistent manner.
Leverages NTFS FS by storing varbinary(max) BLOB data as files on the FS Addresses performance by enabling more memory access for query processing Suited for scenarios where BLOB (unstructured) data is 1 MB or larger and fast read access is desired (i.e. RM scenarios)
Storage attribute onUnstructured Data VARBINARY(MAX) Unstructured data stored directly in the file system (requires NTFS) Dual Programming Model Data Consistency Size limit is the file system volume size Integrated Manageability SQL Server Security Stack (Same asIntegrated Files & Database SQL BLOB)
Unstructured data is Unstructured data is stored in a Filegroup stored in a Filegroup in a separate database or SQL with the associated Server instance from the Content Database with associated Content the related structured Database with the related data on the local SQL structured data Server instance Does not supported integrated Supports integrated management, structured management, i.e. and unstructured data is backup and restore managed separatelyLocal FILESTREAM Remote FILESTREAM
FILESTREAM Provider is limited local storage ◦ DAS, NAS, SAN are considered remote storage regardless of disk presentation ◦ Does not support compression, TDE, and other SQL Server capabilities ◦ Special constraints and limitations apply to BCM scenarios such as Database Mirroring and Log Shipping (see FAQ) 3rd party ISV solutions require SQL Server Enterprise Edition ◦ NAS storage devices require 20ms TTFB
RBS is a downloadable component in the SQL Server 2008 R2 Feature Pack ◦ Includes a set of libraries and interface specifications Defines and exposes 3 views for interaction ◦ Application View Interacts with SharePoint Web Front-end, Provider Library, SQL DB Implemented by SharePoint 2010 – Transparent to the user ◦ Administrator View Windows PowerShell CmdLets – Call Stored Procedures and Functions Installation, configuration, provisioning, RBS Maintainer etc. ◦ Provider View Defines an interface that should be implemented by each BLOB store provider
RBS storage contains two main features: blob storage and blob retrieval. Blobs are immutable so edits are translated in the backend into new blobs. Blobs are only deleted by garbage collection, which scans the content database and blob library, deleting blobs (of a certain age) that are no longer referenced. Where to store items (RBS or inline) is determined at the content database level on the back-end, not the WFE
Business Logic Application Database Server Web Server Return BLOB Read BLOB BLOB Store ResponseUser Request Content Db Get BLOB Id Config Db
Business Logic Application Database Server Web Server Save BLOB Commit Data BLOB Return BLOB Id BLOB Store ResponseUser Request Commit Content BLOB Id & Db Metadata Config Db
mssql_resources.rbs_internal_ta RBS providers store metadata bles in their own tables inside the [internal rbs … content database.Content data]Database … …RBS provider =  AllDocStreams Content RbsId NULL [ id # ] RBS Blob [Inline Content] NULL
DatabaseInformation … … RBS Enabled [1|0] RBS Provider [ providername The provider is set at the ] database level.AllSites … RBS Collection … ID [ id # ] Each site has a separate RBS Provider Logic collection (top-level storage unit). This is the defaultAllDocStreams location for new blobs, and … RBS ID … consulted when adding new [ rbs id (bin blobs. This is a lookup-only data) ] operation. The location of a RBS Provider Logic Blob blob in the AllDocStreams table need not be in the same collection as what the AllSites table default collection.
Select storage solutions designed to support BLOB data storage ◦ Write Once Read Multiple (WORM) devices can pronounce orphan occurrences Prevents deletion of BLOB related SharePoint metadata row ◦ SAN devices support BLOB data replication, locally and over wide area networks using bit mapping. Database mirroring and/or Log Shipping can augment these solutions, but add operation complexity. RBS Maintainer is instrumental in detecting and resolving orphans in the event of failover ◦ Provides Reference Scan (RS), Delete Propagation (DP), and Orphan Cleanup (OC)
RBS can be used in conjunction with commodity storage, i.e. DAS; however, collaborative scenarios should design for IO (RAID 10) ◦ CAPEX and OPEX considerations are critical to realizing ROI Ensure operationally a multi-tiered storage subsystem can be maintained and supported Ensure BCM plans ensure metadata and BLOB data can be kept synchronized ◦ Consider RBS when the following is true: BLOB data files are larger than 256KB on average BLOB data files are at least 80KB and the DB server I/O is a bottleneck ◦ A large number of small BLOBs can decrease performance RBS provides maximum value in archiving and DAM scenarios, particularly large files with infrequent access
Unstructured Data Unstructured Data Unstructured Data Dedicated BLOB Stores File System SQL BLOB Remote File Servers File ServerSolution Dedicated BLOB Store File System/File Server SQL BLOBAdvantages Lower cost per GB at scale Lowest cost per GB Integrated management Scalability & Expandability Streaming Performance Data-level consistencyDisadvantages Complex application Complex application Poor data streaming support development & deployment development & deployment File size limitations Separate data management Integration with structured Highest cost per GB Enterprise-scales only dataExample EMC Centera Windows File Servers SQL Server VARBINARY(MAX) Fujitsu Nearline
Ensure the provider is working within the scope of provided API’s and support policies Evaluate provider characteristics, features, and capabilities Map provider offering to scenario
Required ◦ Implementation of RBS provider interface ◦ Enable multiple provider instances ◦ Guarantee BLOB persistence ◦ Guarantee link-level consistency Recommended ◦ Backup, HA and Disaster recovery capability ◦ Data de-duplication ◦ Expunge, Immutability of BLOBs
1. Enable FILESTREAM2. Provision BLOB Store3. Download and install SQL Server Remote BLOB Store on each database server4. Download and install SQL Server Remote BLOB Store on each front-end Web server5. Enable RBS
Enable/Disable ◦ Enables/Disables usage $database = Get-SPContentDatabase - of RBS with SharePoint WebApplication http://<server> $rbs = GetProviderNames $database.RemoteBlobStorageSettings $rbs.Installed() ◦ List of all registered $rbs.Enable() Providers ◦ Registered Providers are kept track of in Config DB $rbs.SetActiveProviderName($rbs.GetProv iderNames()) SetActiveProvider $rbs ◦ One active provider/BLOB store per Content DB ◦ Other BLOB stores can be used for read operations
Use Windows PowerShell Migrate CmdLet ◦ Moves BLOBs from current location to the current Active RBS Provider store. ◦ Performs a “deep copy” of BLOB, sequentially ◦ Live migration – does not require downtime ◦ Migration can be stopped and resumed Migrate – can be used for upgrade from EBS to RBS
Backup Backup Start Both Backups Complete Restore Start Both Restores are Complete
Provider SQL Server 2008 SQL Server 2008 R22008 Not Supported Not Supported2008 R2 with Supported SupportedFILESTREAMProvider
SPFarm scoped SPContentDatabase scoped Exposed as a COM Exposed as a .NET interface interface (unmanaged) (managed) Does not provide a Provides a configurable configurable maintainer maintainer Supports single provider Supports multiple providers Supports Object Model, Supports Object Model PRIME, and *SQL Server and PRIME backup/restore backup/restore Introduced in SharePoint Deprecated in SharePoint 2010 2010 ◦ Implemented by SQL Server, ◦ Roadmap based on RBS supported with SharePointEBS RBS
Deep copy the binary large objects from EBS either inline or into RBS Internalize EBS binary large objects, re- externalize with RBS
Used by Work and PowerPoint Web Applications ◦ Implemented per Web Application
Database Mirroring does not support FILESTREAM ◦ Log Shipping should augment Database Mirroring for BLOB protection
Successful binary large object implementations require careful planning to ensure expected ROI is realized Binary large object externalization is designed to reduce capital expenditures, it does not resolve capacity or performance constraints Standardized API set allows choice of providers