The document discusses new capabilities for managing and analyzing unstructured data in SQL Server 2012. Key points include:
- SQL Server 2012 introduces FileTable which allows storing and accessing files and folders through standard file system APIs while storing the file data and metadata in SQL Server tables.
- Full-text search is improved with better performance and ability to scale to hundreds of millions of documents. New capabilities like property search and customizable "near" operator are also introduced.
- Semantic search allows extracting keywords and identifying related content based on statistical analysis, without requiring ontologies. It provides insight into unstructured text content.
Introduction to oracle database (basic concepts)Bilal Arshad
Introduction To Oracle Database
Oracle is an Relational Database
Database Management System
What is Oracle Schema ??
Schema !!
More about Schema !!!
Table
Indexes
Oracle Table Spaces
Datafiles
The Oracle Schema or User
Data Access
PL/SQL and Java
In general, database is a collection of data in an organized manner. The organized structure of the database makes it easier to manage the data efficiently. For instance, the structured data is easy to handle and perform specific analysis. Copy the link given below and paste it in new browser window to get more information on Oracle Database:- www.transtutors.com/homework-help/computer-science/oracle-database.aspx
This ppt helps people who would like to present their industrial training presentation on Oracle 11g DBA.
This one includes all the operations that dba has to be perform and some other internal concepts of Oracle.
Best Oracle and Hadoop Institute: orienit is the best Oracle Institute in Hyderabad.Providing oracle courses and Hadoop courses by realtime faculty in hyderabad.
CETPA INFOTECH PVT LTD is one of the IT education and training service provider brands of India that is preferably working in 3 most important domains. It includes IT Training services, software and embedded product development and consulting services.
http://www.cetpainfotech.com
Sql Saturday 111 Atlanta applied enterprise semantic miningMark Tabladillo
SQL Server 2012 debuts a new Semantic Platform (commonly known as the applied task, Semantic Search). This text mining technology leverages the already established Full Text Index, and builds semantic indexes in a two-phase process. This presentation provides a science description and demo for the Enterprise implementation of Tag Index and Document Similarity Index. At present (RTM), the indexes work for 15 languages. Included are strategy tips for how to best leverage the technology along with already-existing Microsoft text mining and data mining.
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.
Introduction to oracle database (basic concepts)Bilal Arshad
Introduction To Oracle Database
Oracle is an Relational Database
Database Management System
What is Oracle Schema ??
Schema !!
More about Schema !!!
Table
Indexes
Oracle Table Spaces
Datafiles
The Oracle Schema or User
Data Access
PL/SQL and Java
In general, database is a collection of data in an organized manner. The organized structure of the database makes it easier to manage the data efficiently. For instance, the structured data is easy to handle and perform specific analysis. Copy the link given below and paste it in new browser window to get more information on Oracle Database:- www.transtutors.com/homework-help/computer-science/oracle-database.aspx
This ppt helps people who would like to present their industrial training presentation on Oracle 11g DBA.
This one includes all the operations that dba has to be perform and some other internal concepts of Oracle.
Best Oracle and Hadoop Institute: orienit is the best Oracle Institute in Hyderabad.Providing oracle courses and Hadoop courses by realtime faculty in hyderabad.
CETPA INFOTECH PVT LTD is one of the IT education and training service provider brands of India that is preferably working in 3 most important domains. It includes IT Training services, software and embedded product development and consulting services.
http://www.cetpainfotech.com
Sql Saturday 111 Atlanta applied enterprise semantic miningMark Tabladillo
SQL Server 2012 debuts a new Semantic Platform (commonly known as the applied task, Semantic Search). This text mining technology leverages the already established Full Text Index, and builds semantic indexes in a two-phase process. This presentation provides a science description and demo for the Enterprise implementation of Tag Index and Document Similarity Index. At present (RTM), the indexes work for 15 languages. Included are strategy tips for how to best leverage the technology along with already-existing Microsoft text mining and data mining.
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.
Applied Semantic Search with Microsoft SQL ServerMark Tabladillo
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.
This session is for you if you want to learn tips and techniques that are used to optimize database development with special emphasis on SQL Server 2005. If you write lot of stored procedures and want to learn the tools of a DBA, this is the session for you. If you are new to SQL Server development environment, you will learn how the various constructs compare to each other and better performance can be produced every time with a brief introduction to understanding Execution Plans.
The AlwaysOn Availability Groups feature is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012, AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise
step by step visual tutorial on how to launch an amazon aws ec2 tutorial.
this deck is part of the ramazon tutorial.
to know more about ramazon visit https://github.com/AndreaCirilloAC/ramazon
The Efficient Use of Cyberinfrastructure to Enable Data Analysis CollaborationCybera Inc.
Dave Fellinger
CTO, DataDirect Networks
Presented at the Cybera/CANARIE National Summit 2009, as part of the session "What's Next: Key Areas of Emerging Cyberinfrastructure."
This session explored some of the up-and-coming areas of cyberinfrastructure and why they are increasingly being considered as essential elements to innovative research and development.
Active Directory Introduction
Active Directory Basics
Components of Active Directory
Active Directory hierarchical structure.
Active Directory Database.
Flexible Single Master Operations (FSMO)Role
Active Directory Services.
Some useful Tool
The Object Evolution - EMC Object-Based Storage for Active Archiving and Appl...EMC
This Technology in Brief, written by Taneja Group, examines the fast-changing world of archiving and development on the web, and how object-based storage for unstructured data provides benefits such as active archiving, global access, fast application development, and much lower cost compared to high computing and data protection costs of NAS.
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
SQLBits 2020 presentation on how you can build solutions based on the modern data warehouse pattern with Azure Synapse Spark and SQL including demos of Azure Synapse.
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
Presentation by James Baker and myself on Running cost effective big data workloads with Azure Synapse and Azure Datalake Storage (ADLS) at Microsoft Ignite 2020. Covers Modern Data warehouse architecture supported by Azure Synapse, integration benefits with ADLS and some features that reduce cost such as Query Acceleration, integration of Spark and SQL processing with integrated meta data and .NET For Apache Spark support.
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
The presentation discusses how to migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions. It shows how you can bring your .NET expertise with .NET for Apache Spark to bear and how the shared meta data experience in Synapse makes it easy to create a table in Spark and query it from T-SQL.
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
This presentation shows how you can build solutions that follow the modern data warehouse architecture and introduces the .NET for Apache Spark support (https://dot.net/spark, https://github.com/dotnet/spark)
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Michael Rys
More and more customers who are looking to modernize analytics needs are exploring the data lake approach in Azure. Typically, they are most challenged by a bewildering array of poorly integrated technologies and a variety of data formats, data types not all of which are conveniently handled by existing ETL technologies. In this session, we’ll explore the basic shape of a modern ETL pipeline through the lens of Azure Data Lake. We will explore how this pipeline can scale from one to thousands of nodes at a moment’s notice to respond to business needs, how its extensibility model allows pipelines to simultaneously integrate procedural code written in .NET languages or even Python and R, how that same extensibility model allows pipelines to deal with a variety of formats such as CSV, XML, JSON, Images, or any enterprise-specific document format, and finally explore how the next generation of ETL scenarios are enabled though the integration of Intelligence in the data layer in the form of built-in Cognitive capabilities.
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
When processing TB and PB of data, running your Big Data queries at scale and having them perform at peak is essential. In this session, we show you some state-of-the art tools on how to analyze U-SQL job performances and we discuss in-depth best practices on designing your data layout both for files and tables and writing performing and scalable queries using U-SQL. You will learn how to analyze performance and scale bottlenecks and will learn several tips on how to make your big data processing scripts both faster and scale better.
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Michael Rys
Big data processing increasingly needs to address not just querying big data but needs to apply domain specific algorithms to large amounts of data at scale. This ranges from developing and applying machine learning models to custom, domain specific processing of images, texts, etc. Often the domain experts and programmers have a favorite language that they use to implement their algorithms such as Python, R, C#, etc. Microsoft Azure Data Lake Analytics service is making it easy for customers to bring their domain expertise and their favorite languages to address their big data processing needs. In this session, I will showcase how you can bring your Python, R, and .NET code and apply it at scale using U-SQL.
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
From theory to implementation - follow the steps of implementing an end-to-end analytics solution illustrated with some best practices and examples in Azure Data Lake.
During this full training day we will share the architecture patterns, tooling, learnings and tips and tricks for building such services on Azure Data Lake. We take you through some anti-patterns and best practices on data loading and organization, give you hands-on time and the ability to develop some of your own U-SQL scripts to process your data and discuss the pros and cons of files versus tables.
This were the slides presented at the SQLBits 2018 Training Day on Feb 21, 2018.
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys
When analyzing big data, you often have to process data at scale that is not rectangular in nature and you would like to scale out your existing programs and cognitive algorithms to analyze your data. To address this need and make it easy for the programmer to add her domain specific code, U-SQL includes a rich extensibility model that allows you to process any kind of data, ranging from CSV files over JSON and XML to image files and add your own custom operators. In this presentation, we will provide some examples on how to use U-SQL to process interesting data formats with custom extractors and functions, including JSON, images, use U-SQL’s cognitive library and finally show how U-SQL allows you to invoke custom code written in Python and R.
Slides for SQL Saturday 635, Vancouver BC presentation, Vancouver BC. Aug 2017.
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
Data Lakes have become a new tool in building modern data warehouse architectures. In this presentation we will introduce Microsoft's Azure Data Lake offering and its new big data processing language called U-SQL that makes Big Data Processing easy by combining the declarativity of SQL with the extensibility of C#. We will give you an initial introduction to U-SQL by explaining why we introduced U-SQL and showing with an example of how to analyze some tweet data with U-SQL and its extensibility capabilities and take you on an introductory tour of U-SQL that is geared towards existing SQL users.
slides for SQL Saturday 635, Vancouver BC, Aug 2017
Killer Scenarios with Data Lake in Azure with U-SQLMichael Rys
Presentation from Microsoft Data Science Summit 2016
Presents 4 examples of custom U-SQL data processing: Overlapping Range Aggregation, JSON Processing, Image Processing and R with U-SQL
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
2. MY FAVORITE BEYOND RELATIONAL APPLICATION
Structured and
unstructured Search
Related/”Semantic”
Search
3. BEYOND RELATIONAL DATA
Building and Maintaining Applications with
relational and non-relational data is hard
Pain Complex integration
Duplicated functionality
Points Compensation for unavailable services
Reduce the cost of managing all data
Simplify the development of applications
Goals over all data
Provide management and programming
services for all data
4. RICH UNSTRUCTURED DATA IN SQL SERVER 2012
• 80% of all data is not stored in databases!
Most of it is “unstructured”
• Make SQL Server the preferred choice for managing Unstructured Data
and allow building Rich Application Experience on top
• Address important customer requests for Capabilities and rich services
for Rich Unstructured Data (RUDS)
o Scale Up for storage and search to 100mio to 500mio documents
o Easy use/access to Unstructured data from all applications
o Rich insight into unstructured data to make better decisions
8. FILETABLE OVERVIEW
• FileTable: A Table of Files/Directories FileTable Folder Hierarchy
• User created Table with a fixed schema
• contains FILESTREAM and File Attributes FILESTREAM Share
MSSQLSERVER
• Each row represents a File or a Directory
my_machineMSSQLSERVER
• System defined constraints maintain the tree Database
Office DocsDocuments
integrity Directories
Private Docs Office Docs
(Database1) (Database2)
• File/Directory hierarchy view through a Windows
Share FileTable Directories
Media Documents LogFiles
• Supports Win32 APIs for File/Directory (FileTable) (FileTable) (FileTable)
Management User-Defined
• DB Storage is Transparent to Win32 applications
Directory Structure
• SMB level of application compatibility
• Virtual network name (VNN) path support for
transparent Win32 application failover
9. CREATING A FILETABLE
Pre-requisites
Enable FILESTREAM
Create FILESTREAM Share and Filegroup
Enable non-transactional access at the DB level
ALTER DATABASE Contoso SET FILESTREAM( non_transacted_access=FULL,
Directory_name = N’Contoso’)
Create FileTable
CREATE TABLE Contoso..Documents AS FILETABLE
WITH (filetable_directory = N'Document Library')
Access at <machine name><FILESTREAM share>ContosoDocument Library
10. MODIFYING A FILETABLE
FileTable has a fixed schema
Columns, system defined constraints cannot be altered/dropped
Allows user defined indexes/constraints/triggers
Disabling/Enabling FileTable Namespace
ALTER TABLE Documents DISABLE FILETABLE_NAMESPACE
Disables all system-defined constraints and Win32 access to
FileTable
Useful for bulk-loading/re-organization of data
FileTable can be dropped similar to any other table
Catalog views can be used for obtaining metadata
11. DATA ACCESS – FILE SYSTEM ACCESS
FileTable hierarchy is visible through Filestream share
machine<FILESTREAMshare><Database_directory><FileTable_Directory>...
Provides transparent Win32 API & File/Directory Management capabilities
e.g. MS word can create/open/save files; xcopy for copying directory trees into
database..
Win32 API operations are non-transactional
Operations cannot be part of any user transactions
Win32 operations are intercepted by SQL Server at the File system level
e.g. File/Directory creation/deletion => insert/delete into FileTable
Full locking/concurrency semantics with other accesses
Allows in-place update of file stream data/File attributes
Transactional FILESTREAM APIs can also be used.
12. DATA ACCESS – T-SQL ACCESS
Normal Insert/Update/Delete allowed for the FileTable manipulation
FileTable Namespace integrity constraints enforced
Set based operations on the File-attributes – value add
Built-in functions
GetFileNamespacePath() – UNC path for a file/directory
FileTableRootPath() – UNC path to the FileTable root
GetPathlocator() – path_locator value for a file/directory
DDL/DML Triggers are supported
DML triggers on a FileTable cannot update any FileTables
13. MANAGING FILETABLE
DB Backup/Restore operations include FileTable data
Point in time Restore‟ may contain more recent FILESTREAM data due to
non-transactional updates during backup
FileTables are secured similar to any other user tables
Same security is enforced for Win32 access also
Data Loading
Windows tools like xcopy/robocopy OR drag-drop operations through
Windows Explorer can be used
BCP operations are supported for direct T-SQL data inserts
SSMS supports FileTable creation/exploration
14. MANAGING FILETABLE – HIGH AVAILABILITY
SQL Server 2012 AlwaysOn is fully supported
Transparent data failover
FileTables can be configured with multiple secondary nodes
Both sync and async data replication is supported
File and metadata is available in the secondary in case of failover
Transparent application failover
Virtual network name (VNN) path support for transparent Win32 application failover
Applications use VNNSharedb... Path
Applications are automatically redirected to the secondary in case of failover
Restrictions
FileTables cannot participate in “Read-only” replicas.
15. FILETABLE RESTRICTIONS
FileTables cannot be partitioned
Merge/Transactional replications are not supported
RCSI/SnapShot isolation mode
Applications cannot modify file stream data in FileTables
Win32 Application compatibility
Memory mapped files, Directory notifications, links are not supported
16. UNSTRUCTURED DATA SCALE-UP
MULTIPLE CONTAINERS FOR FILESTREAM DATA
SQL 2008 R2
Only one storage container/FILESTREAM filegroup
Limits storage capacity scaling and I/O scaling
SQL Server 2012
Support for multiple storage containers/filegroup.
DDL Changes to Create/Alter Database statements
Ability to set max_size for the containers
DBCC Shrinkfile Emptyfile support
Scaling Flexibility
Storage scaling by adding additional storage drives
I/O scaling with multiple spindles
17. UNSTRUCTURED DATA : MULTIPLE CONTAINERS
Use of multiple spindles for achieving better I/O Scalability
18. RUDS SCALE-UP: FILESTREAM PERF/SCALE
Improved performance of T-SQL and File I/O access
Various enhancements to improve read/write throughput
5 fold increase in Read throughput
Linear scaling with large number of concurrent threads
2012 2012
19. SUMMARY: FILETABLE
Application Compatibility for Windows Applications
Windows applications run on top of files stored in FileTables with
no modifications
Relational Value Proposition
Provide Integrated Administration and Services
Backup, Log Shipping, HA-DR, Full text and Semantic search, …
T-SQL orthogonality
File/Folder attributes surfaced through relational columns
Power of set based operations, Policy Management, Reporting etc
FileNamespace Hierarchy management
20. FULL TEXT SEARCH IMPROVEMENTS IN SQL SERVER 2012
Improved Performance and Scale:
Scale-up to 350M documents
iFTS query perf 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times < 3 sec for corpus
At par or better than main database search competitors
New Functionality:
Property Search
customizable NEAR
New Wordbrakers: update existing WB, add Czech and Greek
Innovation in Search:
Semantic Similarity Search
21. FULLTEXT SEARCH PERFORMANCE & SCALE IMPROVEMENTS
Architectural Improvements
Improved internal implementation
Queries no longer block Index updates
Improved Query Plans:
Better Plans for common queries
Fulltext predicate folding
Parallel Plan execution
Index and Query tested on scale up to 350Million documents with
<~2 Sec Response
~3X better w/o DML and ~9X better with DML throughput
Scale easily with increasing number of connections
22. SCALE-UP: FULL-TEXT SEARCH
2005/8 vs 2012
2005/8
2012
Queries over 350M documents database and random DMLs running in background.
Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
23. SCALE-UP: FULL-TEXT SEARCH
2005/8 vs 2012
2005/8
2012
Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer
playback benchmark
24. FULLTEXT PROPERTY SCOPED SEARCH
New Search Filter for Document Properties
CONTAINS (PROPERTY ( { column_name }, 'property_name' ), „contains_search_condition‟ )
• Setup once per database instance to load the office filters
exec sp_fulltext_service 'load_os_resources',1
go
exec sp_fulltext_service 'restart_all_fdhosts'
go
• Create a property list
CREATE SEARCH PROPERTY LIST p1;
• Add properties to be extracted
ALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH
(PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab91-08002b27b3d9',
PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = N'System.Author');
• Create/Alter Fulltext index to specify property list to be extracted
ALTER FULLTEXT INDEX ON fttable... SET SEARCH PROPERTY LIST = [p1];
• Query for properties
SELECT * FROM fttable WHERE CONTAINS(PROPERTY(ftcol, 'System.Author'), 'fernlope');
25. FULL-TEXT CUSTOMIZABLE NEAR
OLD NEAR SYNTAX
select * from fttable where contains(*, 'test near Space')
NEW NEAR USAGES
• SPECIFY DISTANCE
select * from fttable
where contains(*, 'near((test, Space), 5,false)')
• REDUCE DISTANCE
select * from fttable
where contains(*, 'near((test, Space), 2,false)')
• ORDER OF WORDS IS SPECIFIED AS IMPORTANT
select * from fttable
where contains(*, 'near((test, Space), 5,true)')
26. STATISTICAL SEMANTIC SEARCH
Semantic Insight into textual content
Uses language models to find most important keywords in document
No need to build brittle ontologies!
Statistically Prominent Keywords
Autogenerated tag clouds
Potentially Related Content based on extracted Keywords, such as
Similar Products (based on description)
Similar Jobs or Applicants
Similar Support Incidents (based on call logs)
Potential Solutions (based on similar incidents)
First class usage experience
Efficent linear algorithms
Integrated with FTS and SQL
New Rowset functions for all results using SQL query
29. SEMANTIC EXTRACTION: END-2-END EXPERIENCE
• Downloadable Language Statistical Database with registration stored
procedure
• Setup along with Full-Text
• Metadata / Catalog views
• System level DMVs for progress state and usage
• Manageability through SSMS and SMO
30. KEY TAKEAWAYS
SQL Server‟s unstructured data support is key strategy to
enable you to build complex data applications that go
beyond relational data!
Content and Collaboration, eDiscovery, Healthcare, Document
management etc.
31. RELATED CONTENT
SQL Server 2012 Whitepapers and information:
http://www.sqlserverlaunch.com
Channel 9 DataBound Episode 2: http://channel9.msdn.com
MySemanticsSearch Demo: http://mysemanticsearch.codeplex.com
More demo data sets and demo scripts:
http://blogs.msdn.com/b/sqlfts/archive/2011/07/21/introducing-fulltext-
statistical-semantic-search-in-sql-server-codename-denali-release.aspx
Microsoft Virtual Academy Recording: Coming Soon!
Editor's Notes
Let’s take a look at a BR application. What services does it provide. What about having these services supported in the database instead of each application building their own?
Examples: Manage an application that manages images in the file system and additional information in the databaseBuilding a spatial database application before SQL Server 2008Example services: Backup/restore, search over relational and non-relational data
SQL 2008 provides Filestreams as a way add large blobs/unstructured data streams into SQL and still be able to open a Win32 handle (using SQL API) and provide high streaming performance for the data Win32 Namespace support in SQL Server 2012 has the following goals Reduce the barrier to entry for customers who have data in file servers and have Win32 applications that work on these currently. By enabling Win32 namespace, SQL will generate Windows Share that can be exposed to existing Win32 applications similar to any file server shares. This can allow Win32 applications/mid tier servers (like IIS) to work with this data without having to understand the database/transaction semantics Single integrated set of Admin tools – SQL backup/restore, Replication, HA solutions etc Scale up – Add multiple disks on a machine for storing Filestream data. Use SQL services like Full text search for both FileStream and relational metadata, Property Promotion Infrastructure fro extracting interesting properties from SQL blobs/filestream to surface as relational columns for query
Optimized hot paths, removed unnecessary serialization, expensive FileSystem operations etc