Evolution of Database  Technology C. Mohan , PhD   IBM Fellow & IBM India Chief Scientist   Member, IBM Software Group, As...
Some  of Our Database Research Legacy <ul><li>Invention of Relational DBMS & SQL </li></ul><ul><li>Research prototypes </l...
Why We Have Experience with Customers <ul><li>Over 2 decades of partnership  with SWG Toronto & SVL </li></ul><ul><ul><li>...
Leveraging Technology and People IMS Development DB2 Development IDS / U2 Development Customer  Requirements IBM Products ...
SVL  DB2 UDB for z/OS & OS/390 IMS Business Intelligence Content Management DB2 Everyplace Red Brick Icing Traditional AD ...
A Spectrum of Data Serving Requirements Platform:   Mobile   Desktop Small Servers   Large Servers  Data Size:   Micro  Co...
Products to Match the Spectrum of Data Serving Needs DB2 Everyplace OLTP Relational Mobile Embedded  Linux PalmOS Symbian ...
DB2 for z/OS <ul><li>The power and function of an open, industry standard data server  with zSeries’ industry leading avai...
Technology Evolution with Mainframe Specialty Engines Integrated Facility for Linux (IFL) 2001 IBM System z9 Integrated In...
Data Challenges <ul><li>Variety, Velocity, and Volume </li></ul><ul><li>New composite applications need data from multiple...
Addressing the Changing Characteristics of Data Actionability Heterogeneity Scale Satellite & Surveillance Images and Vide...
Key Customer Pain Points <ul><li>Can’t Find Information –  Discovery </li></ul><ul><li>Can’t combine Information –  Integr...
Research in Information and Interaction Drive our leadership technologies for search, structured and unstructured informat...
Worlds of Structured & Unstructured Data Come Together Analytical Complexity Collect Store Retrieve Drill Mine ETL Warehou...
Need for Business Intelligence <ul><li>Loyalty </li></ul><ul><li>Profitability </li></ul><ul><li>Buyer Behavior </li></ul>...
Industry Solutions Deliver Insight On Demand <ul><li>Law Enforcement </li></ul><ul><ul><li>Crime Information Warehouse </l...
OmniFind Key Technologies Content Crawling <ul><li>Scalable Web crawler </li></ul><ul><li>Data Source crawlers </li></ul><...
Content Management Portfolio Strategy   <ul><li>Capture, store, and manage all forms of content </li></ul><ul><li>Complete...
IBM Content Management Platform Roadmap 4Q2004 1Q2005 2005 2006 … and Beyond WebSphere Portal V5.1 Embeds DB2 Content Mana...
Query Optimization <ul><li>Industry-Leading Optimization </li></ul><ul><li>Extensible – SQL to XQuery! </li></ul><ul><li>O...
Unstructured Information Management Architecture <ul><li>Common Research infrastructure for advancing Text Analysis and NL...
Analytics  bridge the  Unstructured & Structured worlds Unstructured Information UIMA High-Value Most Current Content Fast...
Evolution of Metadata Hierarchical Data Model  Rigid Metadata Single Application Domain Specific Ontologies Flexible Metad...
Information Management Trends <ul><li>Information Intensive Applications </li></ul><ul><ul><li>Shift from transaction-cent...
<ul><li>STEM is a tool to help scientists and public health officials create and test models for emerging infectious disea...
Metadata-driven Design for Integration Web Service Build These Using These New Business Process New Integrated View Legacy...
Metadata Will Be Used to Facilitate Information and Application Integration <ul><li>Today  – manual integration, custom ha...
 
Upcoming SlideShare
Loading in...5
×

Evolution of Database Technology - IBM Research | Almaden ...

2,114
-1

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,114
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
63
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Products from the combined research and development efforts … blended with customer requirements … to meet specific sets of customer requirements. An example would be IDS Express … we had an existing product, IDS … but our customers wanted a lower-cost option with many of the same features as the full IDS product …
  • Among the wide variety of business applications, there are an equally wide range of requirements for serving data. Though not exhaustive, this graphic shows a number of important variables that impact selection of a data server. While there are sometimes links across these variables - e.g., a very small server embedded in an application on a mobile device would not be run on the z/OS operating system; there is quite a large set of requirement combinations among our clients and partners.
  • For decades IBM has had market leading share in database management software - initially driven by the role that DB2 and IMS data servers play powering much of the worlds most demanding IT systems. However, as computing solutions continue to evolve, a more broad portfolio is required to address the spectrum of data serving needs. As we just discussed, data servers are used to power individual or multiple applications, with varied data and workload characteristics; on computing platforms of different sizes. While it would be ideal to have a single product that could best meet all market requirements, varied operating environments and sometimes conflicting requirements prohibit such an answer. Much as a hammer is not the tool of choice for driving both a screw and a nail - or as each tool in a well equipped tool box is typically preferred over the corresponding option in a Swiss army knife, providing our partners and clients with the right capabilities for each of their needs gives them the best solution and puts IBM in the best position to win their business. Having a single product which does everything creates enormous coupling, which makes it an incredibly complex product to develop, test and support. This translates into longer development cycles, longer test cycles and significantly longer release cycles. By having a set of products which meet the diverse requirements of data serving, we can pioneer new function and bring it to market more quickly, and then incorporate that technology into other products where necessary at a fraction of the development cost. For example: While clients in one “band of the requirements spectrum” may prioritize both top OLTP and complex query processing performance and scalability, partners in another may prioritize OLTP performance and ease of application driven maintenance; while others may prize small size and low unit price over advanced features or top performance characteristics. An intra-application data server is packaged within a single application and is invisible to the operations team who are not burdened with additional time-consuming installation and maintenance. By relieving their clients of this additional cost and complexity, application providers offer a superior solution. Consequently, the application providers require the data servers to be easy and cost effective for them to include in the application and self-optimizing to deliver required application performance and reliability. Alternatively, a single or multi-application data server is installed as part of the IT infrastructure. It too must offer self-optimization capabilities, including the ability to optimize across applications and workloads of different and potentially variable priority. It must provide an easy to use human interface to the administrators who are potentially managing a complex IT system for many applications. We will review specific requirements addressed by our portfolio: Very small data server for embedding within mobile devices such as PDAs or cell phones - via DB2 Everyplace Compact pure Java data server for including within Java applications - via Cloudscape “ Multi-value” or “extended” relational data server for applications requiring flexible or easily changed data schemas - via U2 (UniVerse and UniData) Pure OLTP intra-application relational data server - via IDS Unmatched OLTP performance hierarchical data server - via IMS Multi-purpose, multi-application, and multi-structure data server - via DB2
  • The original member of the DB2 family, DB2 for z/OS takes advantage of zSeries unique capabilities in areas such as high availability, security and high volume performance. With the combination of DB2 and IMS, zSeries simply provides the most cost effective, least complex - and in some cases only viable - solution for the most extreme data serving needs. We have clients with near peta-bytes databases - not for reference - and no reason applications will be interrupted when exceed a peta-byte zSeries HW/OS PLATFORM has highest rating EAL5… DB2 for zOS has EAL3 pending - combine for highest security rating As we discussed earlier, among zSeries clients taking a more holistic approach as they develop their vision of an enterprise services oriented architecture - often built with WebSphere software - we have seen that we can grow our competitive advantage for projects spanning different computing platforms. Where possible we need to use architecture workshops to help our clients develop IT solutions that deliver optimal business value - vs. looking at short sighted locally optimized answers. IBM is in a unique position to offer this value and we miss an opportunity to differentiate ourselves if we do not seek out and nurture relationships with IT leaders with cross platform responsibilities. We are in the process of developing a set of information architecture workshops that integrate with those used successfully in 2005 by the WebSphere team. The workshops are part of a new program to help you and our clients better understand the unique value of zSeries data services - and develop a realistic roadmap for better utilizing them as part of a well integrated IT System spanning all computing platforms. Again, building a stronger IBM relationship anchored in a value only IBM can offer, puts is in a better position to win more business in projects on all platforms throughout the client’s IT system.
  • It is cheaper than ever to gather all sorts of data about the world around us: surveillance cameras are everywhere, medical procedures are based on extensive tests, our manufacturing plants are fully wired, our stores are monitored for security and customer-relationship management. Not only has the amount-- scale -- of data increased by many orders of magnitude, but the heterogeneity and ability to act on this data have changed. The new types of data are no longer tables of ordered numbers. They consist of a heterogeneous collection of sensor data, in all different forms and quality. Both the changing scale of this data and its heterogeneous form mean that it is much harder to extract meaning from the data. Information accessibility through the World Wide Web has raised the bar on user expectations, and has changed the nature of individual business activities. Addressing ease of access, breadth of relevant results and &apos;usability of results&apos; issues will be key factors to enable agile business decision making. On-demand environments will require access to multiple types of data wherever and whenever needed. Making daily business decisions increasingly relies on the ability to analyze large volumes of both structured and unstructured data. Our research in Information and interaction is addressing these issues...
  • The vertical applications have demonstrated a dependency on insight and knowledge for customer treatment and product planning scenarios ….where reporting and monitoring allow companies to quickly respond to needs. Automotive Quality Insight / Early Warning Retail and aftermarket quality insight - Helps detect problems in product quality as early as possible using all available data sources. Analysis of integrated information can result in reduced warranty cost, improved quality of production vehicles, and improved safety standards. Basel II and Banking Data Warehouse The IBM Risk and Compliance – Basel II Offering is an integrated strategic approach to controlling, measuring and managing risk for improved decision making. In addition, there are over 250 consultants in IBM’s Risk and Compliance practice who are specifically trained in the IT issues facing Basel II implementations. The BDW Model now has complete atomic level structures in place to address all Credit Risk approaches specified in Basel II CP3 (including IRB Advanced) as well as addressing Market Risk and basic Operational Risk as defined in the CP3 Documentation. As well as providing the necessary atomic data structures, the BDW Model now has significant support for the aggregations required by Basel II in the Summary Area of the model. In addition to the Basel II support, BDW 3.2 also includes initial support for Anti Money Laundering (AML). Specifically BDW 3.2 includes three new BSTs and associated Project Views which reflect typical Anti Money Laundering Reporting Requirements. These AML-specific views were built based on AML requirements from such agencies as the Financial Action Task Force (FATF) and the Financial Crimes Enforcement Network (FinCen). Insurance Customer Insight Applies to both Insurance and Banking customers. The Customer Insight solution consists of five primary offerings focused on the analytics behind multi-channel banking. Analytics to Empower Differentiation is designed to identify similar customers and develop appropriate products. Channel Empowerment helps banks utilize their customer channels most efficiently. Customer Management helps to reduce costs through efficient processes and business metrics. Analytics to Empower Compliance focuses on data quality and analytics to allow compliance with regulatory rulings. Data Management to Empower Integration ensures a single customer view. Retail RFID Pivotal to a truly intelligent supply chain Improve availability on the shelf. Respond to consumer demand faster. Transform the way you forecast, manage inventory and distribution, and market to consumers in-store with a fast, responsive and flexible supply chain. Cut costs while gaining competitive advantage by delivering value to the end consumer. The RFID solution is part of a comprehensive strategy which allows in-store data to be used for multiple business processes including realtime merchandising, pricing, promotion, inventory management and replenishment. Life Sciences IBM DiscoveryLink – essentially DB2 Information Integrator under the covers – provides the researched with easy access to heterogeneous data sources required during their research. With DiscoveryLink, IBM built a global infrastructure for Aventis’s scientists to access and synthesize information from multiple chemical and biological databases inside the company and in the public research community. The solution employs utility outsourcing to alleviate IT concerns and for other non-core processes ( variable ), uses a single query feature for speed, is extendable, and protects data through redundancy and security protocols ( resilient ). The project is broadening the firm’s research portfolio and is on track to reduce drug development times as much as 9 years . Healthcare – Aligned Clinical Environment The new IBM Aligned Clinical Environment is designed to help organizations rapidly access relevant and reliable healthcare information, collaborate on key healthcare issues, and improve healthcare delivery. The new offering moves organizations towards an information-on-demand environment that supports: ·   The creation of patient databases and clinical data warehouses that integrate genetic, clinical and genealogical patient data; ·   Better understanding of diseases and identification and validation of potentially effective drugs; ·   Targeted treatment decisions, based on a comprehensive profile of patient data; ·   More focused clinical trials and patient recruitment; and ·   Privacy, protection and security of patient data Telecommunications Data Warehouse (TDW) Improve business with informative data analysis - The telecommunications data warehouse solution from IBM can supply customer behavioral information, such as profitability and spending patterns. This information can improve customer relationships, leading to increased customer satisfaction and retention. Our telecommunications data warehouse solution allowed the company to rapidly develop an enterprise-wide data warehouse in less than 6 months. The new data warehouse contains a complete picture of their customers—a prerequisite for other CRM initiatives. Law Enforcement – the new Crime Information Warehouse solution The Crime Information Warehouse can provide the state, county and large city Police Department with a well-organized, fast, and easy to access repository of crime statistics and reports. This solution allows integration of information related to Incidents, Offenses, Arrests, and Calls for Service for use by users within the Department to make more informed decisions. From a single interface, users of the Crime Information Warehouse (CIW) can access and share reports and information in a multitude of formats. The insight gained can lead to improved police coverage, safer neighborhoods, and improved officer safety. Life Sciences – Drug Discovery Molecular drug discovery. Clinical genomics is changing the field of Biotech and Pharma to develop targeted therapy for some of the most egregious diseases. Companies are looking to new business intelligence solutions that both share treatment information across the health community and analyze alternative strategies for cancer and HIV. Advances in the processing speed of massively parallelized architectures, such as Blue Gene, will allow complex analysis of gene expression prediction to be performed in sub-second response time.
  • Taxonomy &amp; Categorization - Create, import, edit taxonomies - Sort documents by topic against a known taxonomy Phrase Extraction Identify Concepts Search Indexing build index to support querying Entity Extraction Extract Proper Nouns Crawling – from many sources Tokenize - Identify words Language ID - Detect Language Stem - Normalize words to root forms for more efficient indexing Part of Speech - Identify nouns, verbs, adverbs, etc. Fact Extraction Extract Roles, Events, &amp; Relationships
  • IBM started investing in this space in 1995 with the acquisition of Lotus. Since then, we have quietly, but constantly been investing in this business. In 1998, we started the Pervasive group to enable mobile computing. In 2000, we added Portal to provide a single point of access to information and applications across the organization. Most recently, in 2003 we announced our newest product group, Lotus Workplace. Our integrated family of collaborative products that integrate people with business processes. With market analysts, awards from media and success in the market. We are clearly leaders in this space. Along the way, we have seen incredible growth. We are leaders in nearly every part of the Workplace market. Let’s take Lotus Notes and Domino . The revenue leader in the integrated collaborative environment market for years. Since acquiring Lotus, IBM has seen more than 3000% growth in the number of users, growing to more than 110M users at the end of 2003. And, despite what our competitors would have you believe, it is an area of our business we are continuing to invest and grow. Consider that IBM still has somewhere in the area of 1,000 developers working on future releases of Notes and Domino. In WebSphere Everyplace, our Pervasive Group , we are the leader in Mobile Middleware according to IDC, in Enterprise Mobile Computing according to the Yankee Group, in Enterprise Voice Portals according to Gartner, and in Mobility according to Meta Group. ((((these PvC points MUST change)))) This translates to shear presence in the marketplace, with 37M smartcards in use today, and now 51 models of General Motors cars plus two from Honda are shipping with IBM embedded speech technology. Honda Input including some auto telematics sparklers Telematics: Major driver of automotive telematics. Visonary thinking with knowledge for today&apos;s market needs - Gartner, March 2004 (Original quote)&amp;quot;IBM is a major driver of successful automotive telematics solutions. The company&apos;s ability to provide visionary thinking, combined with the knowledge for today&apos;s real market needs, gives it the credibility OEMs and suppliers are looking for.&amp;quot; (Source: Gartner - March 2004) IBM: Truly holistic automotive offering - Gartner, March 2004 (original quote) &amp;quot;Automotive businesses today prefer service and technology providers that have a comprehensive understanding of the challenges and opportunities in the automotive industry. IBM is one of the very few companies that have the competence to provide a truly holistic automotive offering.&amp;quot; (Source: Gartner - March 2004, Thilo Koslowski, Gartner automotive analyst) Speaking points: key differentiator in cars is now software. Carmakers need a vendor who provides high-quality, defect-free code; is able to integrate the solution and have a deep understanding of the industry and its needs. Differentiation that leads to customers satisfaction/ profits. IBM works in all aspects of telematics - Hyundai Korea is using IBM hardware, software and services to provide GPS navigation delivering up-to-the minute traffic information; automatic emergency services notification in an accident; data synchronization for PDAs and other mobile devices; e-mail; delivery of weather and news updates; vehicle diagnostics, maintenance and engine data tied to manufacturers, dealers and drivers; remote entry assistance and vehicle theft tracking - all based on open standards for flexibility. IBM works with major telematics suppliers including Motorola, Denso, Delphi, Bosch/Vetronix, LGE, Harman Becker and others to develop open-standards-based in-vehicle Telematics Control Units. Devices: IBM&apos;s embedded middleware supports more of the handheld operating systems worldwide than any other vendor - - including Palm, Linux, Symbian, RIM, and Pocket PC. Original quote: With its latest version, IBM WebSphere Everyplace Access Software now supports more of the handheld operating systems worldwide than any other vendor - - supporting Palm, Linux, Symbian and Pocket PC. Together, these handheld operating systems make up at least 90% of the worldwide market. (Todd Kort, Gartner, 2003) B.Sparklers for Customer charts - Bullet Points: Honda: 2003 JD Powers &amp; Associates, rating customer satisfaction with in-car navigation systems: three cars in the top five use voice technology powered by IBM&apos;s Embedded Via Voice. (Honda Accord, Acura TSX, Acura MDX) “ The 2003 Model Year Accord, 2004 Model Acura TSX, and the 2003 Model Acura MDX (are) #1,2, and 5 in JD Power. I think this system ranks highest in overall satisfaction because of IBM Voice-Control Systems. Thank you !!!!!”- Koichi Kojima, navigation/voice system lead for Honda Research With WebSphere Portal , we re seeing explosive growth with 61% growth year-to-year in the number of customers. Customers are using our portal to serve their customers, for partners or the supply chain, and for employees. So many companies are using our portal now that we are seeing 300% year-to-year growth in the number of portlets available for use. This makes integrating sources of information and access to applications much easier to do. And, again, we are the market leaders in this space according to IDC and Gartner. Most recently, the Lotus Workplace portfolio has been growing with 8 new offerings in the last 12 months – 8 new offerings in one year. That is phenomenal portfolio growth. And with the 2.0 release this month, we are evolving those Workplace products into a Workplace platform with the capabilities and tools necessary to create new Workplace applications. These innovative new products are capturing market share with more than 320 new customers already. Along the way, we have been building a phenomenal portfolio . The breadth of our products covers nearly all aspects of the user experience. And, as we have been developing these products, we have been converging our technologies. Building upon our successes in one area and leveraging it in another. We see that with Lotus Workplace especially, but we are “cross pollinating” our technology across our products. So IBM has been in the market for a long time. We have gained quite a lot of experience in delivering the kinds of products and technologies customers, and business partners, want and need to use everyday. &lt;CLICK&gt;
  • Contact: Dave Ferrucci / Arthur Ciccolo 3/03 IBM’s Unstructured Information Management Architecture (UIMA) is being developed under a world-wide Research division effort to support the rapid integration and deployment of a wide variety of analytical techniques to assist in the integration and processing of large volumes of structured and unstructured information. Implementations of UIMA provide a middleware which will be used to build applications that need to sift through huge quantities of structured and unstructured information to discover, translate, summarize, categorize and organize the knowledge relevant to important applications ranging from national and business intelligence to bioinformatics. Core components include a powerful semantic search engine, a central document and meta-data store and an Analysis Engine framework -- all communicating through XML and web services. Current development is focused on the deployment of text analysis engines, however the architecture is being extended to other unstructured media included voice, audio and video. The primary design objectives of IBM’s UIMA are two fold - first, to provide a solid infrastructure for composing advanced search and analysis applications from reusable components which may be embedded in product platforms or delivered as highly-distributed and scalable stand-alone applications or services. The second primary design objective is to facilitate the rapid combination of analytical techniques in support of the Combination Hypothesis. This is the hypothesis that significant scientific advances can be made in the precision of analysis and search results, if independently developed techniques with different strengths and weaknesses may be quickly combined to produce superior and otherwise uncharted solutions. Different unstructured artifacts, examples include text documents, video , audio or voice files, are gathered and organized into collections. The application with the help of the analysis engine directory services selects the types of analyses engines that should be applied to the collection. [Example analysis engines include language translators, document summarizers, document classifiers, scene detectors, geography detectors, glossary builders etc. Each analysis engines specializes in discovering relevant concepts (or &amp;quot;semantic entities&amp;quot;) otherwise unidentified in the document text (or video image for example)] Applications feed the collection through the collection processing manager (CPM) whose primary responsibility is to apply the selected analysis engine(s) to each element (e.g., document, video etc) in the collection and to further process the results ensuring, for example, that the results of the analysis may be associated as meta-data with the element in the document meta-data store. [The analysis of each collection element is captured in specialized structure XML structured called the CAS or Common Analysis System (not indicated in the diagram). The CAS holds all the results of analysis in a common representation medium for communication between UIMA components and for storing key meta-data.] [Analysis engines, to do their job, may consult a wide variety of structured knowledge sources. They do this in a uniform way through Structured Knowledge Access components called Knowledge Source Adapters (KSAs). These objects manage the technical communication and semantic mapping necessary to deliver knowledge encoded in databases, dictionaries, knowledge bases and other structured sources to the analysis engines in a uniform way and in the language the analysis engine can understand. Different KSA maybe discovered based on the type of knowledge the provide through the KSA Directory Service. ] The CPM may also be configured by the application to extract a variety of elements from the analysis (i.e.. the CAS) and index these results in the semantic search engine. Not only may tokens (i.e., simple words) be indexed for efficient search but the concepts (or &amp;quot;semantic entities&amp;quot;) discovered as part of analysis may also be indexed and therefore made accessible at search time. UIMA calls for an advanced search capability that allows for querying for artifacts based on a combination of the concepts and tokens discovered by analysis engines. After the CPM has processed the collection, the application can use the results of analysis stored in the meta-data store as well as captured in the rich index of tokens and concepts processed by the search engine to deliver the key knowledge required by the user and in the ideal form. At this stage the application may additionally request translations, summaries, concept highlighting of the results. Post analysis may also be performed on the fly to further analyze input dynamically at the request of the application.
  • Distillery provides a platform for adaptively assembling, configuring, managing, scheduling and running analytics. UIMA provides a development framework for building, describing &amp; integrating component analytics.
  • Hierarchic Data Model Hierarchic data bases supported only queries that were designed by the original data administrator IMS – the first Hierarchic data model became commercial 1968 – (Introduction to data bases C.J. Date – 1985 p 503 Relational Data Model DB2 first version 1984? Extensible Data Model (XML) - XML 1 standard 1998 XML is the acronym for eXtensible Markup Language, the universal format for structured documents and data on the Web. XML is an industry-standard protocol administered by the World Wide Web Consortium (W3C). Domain Specific Ontologies - OWL standard 2004 OWL is a W eb O ntology L anguage. OWL builds on RDF and RDF Schema and adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. &amp;quot;exactly one&amp;quot;), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes. (more info here: http :// www . w3 . org / 2004 / OWL)
  • Source Sandesh Bhat/ Al Miyashita
  • Evolution of Database Technology - IBM Research | Almaden ...

    1. 1. Evolution of Database Technology C. Mohan , PhD IBM Fellow & IBM India Chief Scientist Member, IBM Software Group, Asset Architecture & Information Management Architecture Boards http://www.almaden.ibm.com/u/mohan/ [email_address]
    2. 2. Some of Our Database Research Legacy <ul><li>Invention of Relational DBMS & SQL </li></ul><ul><li>Research prototypes </li></ul><ul><ul><li>System R & SQL </li></ul></ul><ul><ul><li>R* Distributed DBMS </li></ul></ul><ul><ul><li>Starburst Extensible Object-Relational DBMS </li></ul></ul><ul><ul><li>Garlic Heterogeneous DBMS </li></ul></ul><ul><li>Product Contributions </li></ul><ul><ul><li>Data sharing on DB2 390 Sysplex </li></ul></ul><ul><ul><li>DB2 UDB Query Processor </li></ul></ul><ul><ul><li>Intelligent Miner </li></ul></ul><ul><ul><li>Lotus Notes R5 Recovery </li></ul></ul><ul><ul><li>Discovery Link & DB2 Information Integrator </li></ul></ul><ul><li>6 IBM Fellows from team of < 50 </li></ul>
    3. 3. Why We Have Experience with Customers <ul><li>Over 2 decades of partnership with SWG Toronto & SVL </li></ul><ul><ul><li>Incorporation of Starburst prototype into DB2 </li></ul></ul><ul><ul><li>Component Owners of DB2 for LUW’s Query Compiler </li></ul></ul><ul><ul><li>Versions 2 – 5 (1992-1997) </li></ul></ul><ul><ul><li>Dealt with customer APARs, Visits, & Presentations </li></ul></ul><ul><li>Responsible for many DB2 innovations </li></ul><ul><ul><li>Query Graph Model (internal query representation, key to extensibility) </li></ul></ul><ul><ul><li>Query ReWrite and Optimizer technology </li></ul></ul><ul><ul><li>ARIES transaction methods </li></ul></ul><ul><ul><li>Triggers and Constraints </li></ul></ul><ul><ul><li>Star Join and Hash Join </li></ul></ul><ul><ul><li>Object-relational features </li></ul></ul><ul><ul><li>Automatic Summary Tables (materialized views) </li></ul></ul><ul><ul><li>Visual Explain </li></ul></ul><ul><ul><li>Index Advisor </li></ul></ul><ul><li>Respected for our vision </li></ul><ul><ul><li>World-class publications in leading database conferences </li></ul></ul><ul><ul><li>Cognizant of industry trends </li></ul></ul>
    4. 4. Leveraging Technology and People IMS Development DB2 Development IDS / U2 Development Customer Requirements IBM Products IBM Research
    5. 5. SVL DB2 UDB for z/OS & OS/390 IMS Business Intelligence Content Management DB2 Everyplace Red Brick Icing Traditional AD Languages Boeblingen DB2 Text Extenders SAP/R3 Enablement Intelligent Miner for Data Intelligent Miner for Text Somers Hawthorne Advanced Technology Almaden Advanced Technology Austin GBIS Portland XPS & DB2 Lenexa IDS Boulder & Denver Content Management U2 Datablades Boca Raton & Miami EMMS LA Informix Support Rochester DB2 UDB for AS/400 Toronto DB2 UDB for UNIX, Windows, & OS/2 IBM Information Management Teams Beijing Information Integration DB2 for zOS Content Management DB2 and IMS tools Las Vegas Entity Analytics Over 6000 employees worldwide Menlo Park & Oakland IDS XPS JDBC Visionary Cloudscape Datablades Object Connect & Translator Content Management India DB2 UDB Service Business Intelligence IDS Yamato High Speed Inverted Index Search Business Intelligence Content Management Hursley Enterprise Master Data Solutions <ul><li>India Software Lab </li></ul><ul><ul><li>1700 employees </li></ul></ul><ul><ul><li>Broad range of skills – all SWG Brands </li></ul></ul><ul><ul><li>Linux Competency Center </li></ul></ul><ul><li>DB2 Lab within ISL </li></ul><ul><ul><li>100+ developers </li></ul></ul><ul><ul><li>Lab based services teams – DB2, CM, BI </li></ul></ul><ul><li>Other Resources </li></ul><ul><ul><li>India Research Lab </li></ul></ul><ul><ul><ul><li>http://www.research.ibm.com/irl/projects// </li></ul></ul></ul><ul><ul><li>Solution Porting Center </li></ul></ul><ul><ul><li>Education Center for IBM Software </li></ul></ul><ul><ul><li>IBM Academic Initiative </li></ul></ul>
    6. 6. A Spectrum of Data Serving Requirements Platform: Mobile Desktop Small Servers Large Servers Data Size: Micro Compact Large Extremely Large Workload: Batch Online Transactions Real-time Analysis Data Mining Structure: Hierarchical Relational Multi-Value XML OS: Symbian PalmOS Windows Linux Unix(s) i5/OS z/OS Scope: Embedded Intra-application Single application Multi-application Support: None Web/E-mail Business hours 24x7
    7. 7. Products to Match the Spectrum of Data Serving Needs DB2 Everyplace OLTP Relational Mobile Embedded Linux PalmOS Symbian Cloudscape OLTP Relational Intra-App / Single-App Java IDS OLTP Relational Intra-App / Single-App AIX, etc. Linux Windows DB2 OLTP & Analysis Relational & XML Single / Multi-App z/OS I5/OS AIX, etc. Linux Windows IMS OLTP Hierarchical Single / Multi-App z/OS U2 OLTP Multi-Value Intra-App / Single-App AIX, etc. Linux Windows Superior capabilities across the spectrum of requirements
    8. 8. DB2 for z/OS <ul><li>The power and function of an open, industry standard data server with zSeries’ industry leading availability, performance, and security </li></ul><ul><li>What it takes to be the industry’s most extreme data server </li></ul><ul><li>Continuous application availability measured in years </li></ul><ul><li>Ability to process over 1B SQL transactions per hour </li></ul><ul><li>Uninterrupted growth from 1 byte to over a peta-byte </li></ul><ul><li>Serving 100s of applications for 100,000s of users </li></ul><ul><li>US Government’s highest security classification (zSeries) </li></ul><ul><li>Support for industry standards: XML, Web services, Java, C, COBOL </li></ul><ul><li>Support for complex business applications: SAP, PeopleSoft, Siebel </li></ul>Extreme qualities of service XML and Relational data server
    9. 9. Technology Evolution with Mainframe Specialty Engines Integrated Facility for Linux (IFL) 2001 IBM System z9 Integrated Information Processor (IBM zIIP) planned for 2006 System z9 Application Assist Processor (zAAP) 2004 <ul><li>Building on a strong track record of technology innovation with specialty engines, IBM intends to introduce the System z9 Integrated Information Processor </li></ul><ul><li>Support for new workloads and open standards </li></ul><ul><li>Designed to help improve resource optimization for eligible data workloads within the enterprise </li></ul><ul><li>Centralized data sharing across mainframes </li></ul><ul><li>Incorporation of JAVA into existing mainframe solutions </li></ul>Internal Coupling Facility (ICF) 1997
    10. 10. Data Challenges <ul><li>Variety, Velocity, and Volume </li></ul><ul><li>New composite applications need data from multiple sources </li></ul><ul><ul><li>Consumers expect holistic, personalized, and value-added content </li></ul></ul><ul><ul><li>Relational, XML, packaged applications, content repositories, file systems all contain critical business information </li></ul></ul><ul><li>Increasing emphasis on current data </li></ul><ul><ul><li>Real-time analytics </li></ul></ul><ul><ul><li>Business activity monitoring </li></ul></ul><ul><li>Petabytes will be the measure of available online data </li></ul><ul><ul><li>All client interactions are important ( e.g., instant messages, audio records, web traffic,…) </li></ul></ul><ul><ul><li>Internet and intranet content </li></ul></ul>The world produces 250MB of information every year for every man, woman and child on earth. 10-100GB 100s GB - 1TB 1 - 20 GBs 100s MB 100s KB 1999 1s TB 1s TB 100s TB 100s TB 1s TB 1s TB 10s GB 10s GB 1s GB 1s GB 2004 10X 100X 100X 1,000X 10,000X Common Database Sizes Common Database Sizes Transactions Warehouses Marts Mobile Pervasive 37% CGR Disk Growth ’96-’07 70,000 TB of TV and Radio content in 2002 alone; 30% growth/year
    11. 11. Addressing the Changing Characteristics of Data Actionability Heterogeneity Scale Satellite & Surveillance Images and Video Gene Sequences Transactions Text and Web Increasing need to manage and analyze new data types Protein Folding
    12. 12. Key Customer Pain Points <ul><li>Can’t Find Information – Discovery </li></ul><ul><li>Can’t combine Information – Integration </li></ul><ul><li>Can’t extract value from Information – Insight </li></ul><ul><li>Can’t consume Information – Dissemination </li></ul>
    13. 13. Research in Information and Interaction Drive our leadership technologies for search, structured and unstructured information processing and analytics, natural language processing, and conversational and multimodal interaction, across multiple tiers of business activities in SWG products and solutions. Foster the exploitation of components with these leading research technologies in IGS services offerings. CM Information Retrieval NLP Analytics Video Analysis Conversational and Multimodal Interactions Unstructured Information Management Information Management Database Synthesis Information Integration Metadata Speech Recognition
    14. 14. Worlds of Structured & Unstructured Data Come Together Analytical Complexity Collect Store Retrieve Drill Mine ETL Warehouse SQL OLAP Cluster, Classify, .. Crawl ECM Search Navigate Cluster, Classify, .. Solutions II Structured Data Unstructured Data
    15. 15. Need for Business Intelligence <ul><li>Loyalty </li></ul><ul><li>Profitability </li></ul><ul><li>Buyer Behavior </li></ul><ul><li>Targeted Offers </li></ul>Homeland Security <ul><li>Internet Buzz </li></ul><ul><li>Anti-Money Laundering </li></ul><ul><li>Border Control </li></ul><ul><li>Crime Information </li></ul><ul><li>Globalization </li></ul><ul><li>Business Controls </li></ul><ul><li>Mergers and Acquisitions </li></ul><ul><li>Supply Chain Efficiencies </li></ul>Accountability and Compliance Customer Knowledge Business Performance <ul><li>Risk Management </li></ul><ul><li>Fraud and Abuse </li></ul><ul><li>Public Protection </li></ul>HIPAA Basel II Patriot Act Sarbanes-Oxley Capitalism and Its Troubles: A Survey of International Finance -May 24, 2002 Preparing for terror How scared should you be? Nov 28th 2002 From The Economist print edition
    16. 16. Industry Solutions Deliver Insight On Demand <ul><li>Law Enforcement </li></ul><ul><ul><li>Crime Information Warehouse </li></ul></ul><ul><ul><li>Entity Resolution </li></ul></ul><ul><ul><li>Anti Money Laundering </li></ul></ul><ul><li>Banking </li></ul><ul><li>Basel II and Banking Data Warehouse </li></ul><ul><li>Entity Resolution </li></ul><ul><li>Health Care </li></ul><ul><li>Aligned Clinical Environment </li></ul><ul><li>Retail </li></ul><ul><li>RFID </li></ul><ul><li>Retail Data Model </li></ul><ul><li>Telco </li></ul><ul><li>Telco Data Warehouse </li></ul><ul><li>Insurance </li></ul><ul><li>Customer Insight </li></ul><ul><li>IIW </li></ul><ul><li>Automotive </li></ul><ul><li>Quality Insight Early Warning </li></ul><ul><li>Life Sciences </li></ul><ul><li>Drug Discovery </li></ul>
    17. 17. OmniFind Key Technologies Content Crawling <ul><li>Scalable Web crawler </li></ul><ul><li>Data Source crawlers </li></ul><ul><li>Content Push </li></ul>Parsing/ Tokenizing <ul><li>HTML/XML </li></ul><ul><li>200+ Doc Filters </li></ul><ul><li>Advance Linguistic </li></ul>Search Collections Categorization <ul><li>Taxonomy </li></ul><ul><li>Rule-based </li></ul>Annotation <ul><li>Text Analytics </li></ul><ul><li>Plug-in </li></ul>Indexing <ul><li>Global Analysis </li></ul><ul><li>Static Ranking </li></ul><ul><li>Store </li></ul><ul><li>Dynamic Ranking </li></ul><ul><li>Fielded Search </li></ul><ul><li>Dynamic Summary </li></ul><ul><li>Parametric Search </li></ul><ul><li>Spell Checking </li></ul>Searching Security
    18. 18. Content Management Portfolio Strategy <ul><li>Capture, store, and manage all forms of content </li></ul><ul><li>Complete and scalable, content management functionality </li></ul><ul><ul><li>Document management </li></ul></ul><ul><ul><li>Image management </li></ul></ul><ul><ul><li>Digital asset management </li></ul></ul><ul><ul><li>Report management </li></ul></ul><ul><ul><li>Web content management </li></ul></ul><ul><ul><li>Records management </li></ul></ul><ul><ul><li>Digital rights management </li></ul></ul><ul><ul><li>Email/Messaging archiving and management </li></ul></ul><ul><ul><li>Collaboration tools </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Enterprise-scale business process management </li></ul><ul><li>Cross-portfolio, out-of-the-box integration </li></ul><ul><li>Rich, common client platform </li></ul>
    19. 19. IBM Content Management Platform Roadmap 4Q2004 1Q2005 2005 2006 … and Beyond WebSphere Portal V5.1 Embeds DB2 Content Manager Runtime Edition (JCR) Records Manager V4.1.1 A Dynamic RM Infrastructure Workplace Web Content Management V2.0 Leveraging DB2 Content Manager and WebSphere Portal Framework DB2 Content Manager V8.3 Enhance Doc Routing Enable BPM Extend Integration Capabilities Seamless RM DB2 Document Manager V8.3 Compliance/RM Extending Native Language Support DB2 CommonStore V8.3 Full-Text Search Seamless RM First Step ECM Unified Client New Portlets J2EE Web Components Extend to DPM Extend Document Management Email/Messaging Archiving and Management Enhancements Physical Records Management Virtual Records Management WCM Leveraging Workplace and DB2 Content Manager Runtime (JCR) Common Content Repository Workplace Unified End-User Experience (Client) Event Framework Integrated / Interoperable DPM/BPM Extended ECM Capabilities as Add-On Features Enterprise JCR IBM CM SDK Enterprise Content Integration – JSR170 DB2 Content Manager Runtime in ISV Applications LDDM* Fully Supports JSR170 Autonomic Capabilities Content Preservation Content Intelligence Pervasive Enablement … and More * Lotus Domino Document Manager
    20. 20. Query Optimization <ul><li>Industry-Leading Optimization </li></ul><ul><li>Extensible – SQL to XQuery! </li></ul><ul><li>Optimizes for Parallel </li></ul><ul><ul><li>I/O accesses </li></ul></ul><ul><ul><li>Within a node (SMP) </li></ul></ul><ul><ul><li>Between nodes (MPP) </li></ul></ul><ul><li>Powerful for complex OLAP & BI queries </li></ul><ul><li>Industry-Strength Engineering </li></ul><ul><li>Portable </li></ul><ul><ul><li>Across HW & SW platforms </li></ul></ul><ul><ul><li>Databases of 1 GB to > 300 TB </li></ul></ul><ul><li>Continuing &quot;technology pump&quot; of improvements from Research </li></ul>
    21. 21. Unstructured Information Management Architecture <ul><li>Common Research infrastructure for advancing Text Analysis and NLP capability </li></ul><ul><ul><li>Promotes re-use of best-of-breed components </li></ul></ul><ul><ul><li>Promotes combination hypothesis through ease of integration </li></ul></ul>Unstructured Information Application Libraries Specialized Application Libraries Provide basic functions common to a broad class of application libraries & applications (e.g. Glossary Extraction Taxonomy Generation, Classification, Translation, etc.) Question Answering e-Commerce Semantic Search Engine Token and Concept Indexing Query Key words, concepts, spans, ranges -> Ranked Hit List National & Intelligence Business Bioinformatics Technical Support Document & Meta Data Store Documents with meta data based on key-value pairs Enables view & collection management (Text) Analysis Engine (TAEs) Combination of analysis engines employing a variety of analytical techniques and strategies Structured Knowledge Access Knowledge Source Adapters - (KSAs) deliver content from many structured knowledge sources according to central ontologies Collection Processing Manager KSA Directory Service Dynamic query & delivery of KSAs TAE Directory Service Dynamic query & delivery of TAEs UIMA Standard Application Libraries Relevant Application Knowledge Structured Data UIM Solutions
    22. 22. Analytics bridge the Unstructured & Structured worlds Unstructured Information UIMA High-Value Most Current Content Fastest Growing BUT ... Buried in Huge Volumes – Lots of Noise Implicit Semantics Inefficient Search Explicit Structure Explicit Semantics Efficient Search Focused Content Text , Chat, Email, Audio, Video Indices DBs KBs <ul><li>Identify Semantic Entities, Induce Structure </li></ul><ul><ul><li>Chats, Phone Calls, Transfers </li></ul></ul><ul><ul><li>People, Places, Org, Events </li></ul></ul><ul><ul><li>Times, Topics, Opinions, Relationships </li></ul></ul><ul><ul><li>Threats, Plots, etc. </li></ul></ul>UIMA - The Big Picture Structured Information
    23. 23. Evolution of Metadata Hierarchical Data Model Rigid Metadata Single Application Domain Specific Ontologies Flexible Metadata Cross Industry Integration Increased Business Value of Metadata Syntactic annotation of data: what this data represents Semantic annotations of data: what this data means Relational Data Model Rigid Metadata Integration Within Enterprise Extensible Data Model (XML) Flexible Metadata Integration Within Industry 1970 1990 2000 2010 1980
    24. 24. Information Management Trends <ul><li>Information Intensive Applications </li></ul><ul><ul><li>Shift from transaction-centric to information-intensive applications </li></ul></ul><ul><li>Information Diversity </li></ul><ul><ul><li>Delivering insight over increasingly diverse sources of information </li></ul></ul><ul><li>New Business & Delivery Models </li></ul><ul><ul><li>Information as a Service, Outsourcing, New Licensing Models </li></ul></ul><ul><li>Democratization of Information </li></ul><ul><ul><li>Changing User Expectations & the “Parent Test” </li></ul></ul><ul><li>Massive Collaboration & Societal Intelligence </li></ul><ul><ul><li>Collaboration over shared information to creating business insight </li></ul></ul>
    25. 25. <ul><li>STEM is a tool to help scientists and public health officials create and test models for emerging infectious diseases. </li></ul><ul><ul><li>Understand disease dynamics </li></ul></ul><ul><ul><li>Test outcomes of preventative actions </li></ul></ul><ul><li>Diverse Data Sources </li></ul><ul><ul><li>GIS data for every county – borders, populations, shared borders, highways, airports </li></ul></ul><ul><ul><li>Susceptible/Infectious/Recovered (SIR) models </li></ul></ul><ul><ul><li>Susceptible/Exposed/Infectious/Recovered (SEIR) models </li></ul></ul><ul><ul><li>Multi-serotype disease models </li></ul></ul><ul><ul><li>Public health policy events </li></ul></ul><ul><ul><li>User specified disease vectors </li></ul></ul>Spatiotemporal Epidemiological Modeler http://www.alphaworks.ibm.com/tech/stem
    26. 26. Metadata-driven Design for Integration Web Service Build These Using These New Business Process New Integrated View Legacy and packaged apps Relational databases XML documents New DataFlow WBI II ETL 40% of IT budgets may be spent on integration 30% of people’s time is searching for relevant information 30% of development time is copy management <ul><li>Remember It </li></ul><ul><li>Remember relationships and dependencies </li></ul><ul><li>Find It </li></ul><ul><li>Find and visualize related information </li></ul><ul><li>Connect It </li></ul><ul><li>Generate the integration glue </li></ul>
    27. 27. Metadata Will Be Used to Facilitate Information and Application Integration <ul><li>Today – manual integration, custom hard-wired integration </li></ul><ul><li>Tomorrow – semi-automated integration by using tools and connectors </li></ul><ul><li>Future – automated integration through metadata standards and tools </li></ul><ul><ul><li>Dictionaries </li></ul></ul><ul><ul><li>Taxonomies </li></ul></ul><ul><ul><li>Ontologies </li></ul></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×