Your SlideShare is downloading. ×
Oracle sql plsql & dw
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Oracle sql plsql & dw

3,411
views

Published on

This Document will contain Datawarehousing and SQL and PL/SQL Concepts.

This Document will contain Datawarehousing and SQL and PL/SQL Concepts.

Published in: Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,411
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
219
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. DWH ConceptsWhat is a DATA WAREHOUSE?A data warehouse is a relational database that is designed for query and analysisrather than for transaction processing. It usually contains historical data derived fromtransaction data, but it can include data from other sources. It separates analysisworkload from transaction workload and enables an organization to consolidate datafrom several sources. In addition to a relational database, a data warehouseenvironment includes an extraction, transportation, transformation, and loading (ETL)solution, an online analytical processing (OLAP) engine, client analysis tools, andother applications that manage the process of gathering data and delivering it tobusiness users.® A data warehouse is a database designed to support a broad range of decisiontasks in a specific organization. It is usually batch updated and structured for rapidonline queries and managerial summaries. Data warehouses contain large amountsof historical data. The term data warehousing is often used to describe the processof creating, managing and using a data warehouse.What are the characteristics of a DATA WAREHOUSE?The characteristics of a DWH are• Subject-Oriented: DWH’s are designed to help you analyze data. For example, to learn more about the company’s sales data, you can build a warehouse that concentrates on sales. This ability to define a DWH by subject matter, sales in this case makes the DWH subject oriented.• Integrated: It is closely related to subject orientation. DWH’s put data from desperate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said be integrated.• Nonvolatile: It means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred and whatever once happened never changes.• Time-Variant: In order to discover trends, analysts need large amounts of data. This is very much in contrast to OLTP systems, where performance requirements demand that historical data be moved to an archive. A DWH focus on change over time is what is meant by the term time variant.What are the goals of a DATA WAREHOUSE?
  • 2. The goals of a DATA WAREHOUSE are• To provide a reliable, single integrated source of key corporate information.• To give end users access to their data without a reliance on reports produced by the information system department.• To allow analysts to analyze corporate data and even produce predictive “what if” models from that data.The data warehouse is simply one component of modern reporting architectures.The real goal of reporting systems are decision support –or its modern equivalentBusiness intelligence-to help people makes better, more intelligent decision.When should a company consider implementing a data warehouse?Data warehouses or a more focused database called a data mart should beconsidered when a significant number of potential users are requesting access to alarge amount of related historical information for analysis and reporting purposes.So-called active or real-time data warehouses can provide advanced decisionsupport capabilities.What are the uses of DATAWAREHOUSE?• It separates analysis workload and enables an organization to consolidate data from several sources.• It manages the process of gathering data and delivering to business users.• It is used to analyze data.• It puts data from desperate sources into a consistent format.What are the benefits of data warehousing?Some of the potential benefits of putting data into a data warehouse include:1. Improving turnaround time for data access and reporting;2. Standardizing data across the organization so there will be one view of the "truth";3. Merging data from various source systems to create a more comprehensive information source;4. Lowering costs to create and distribute information and reports;5. Sharing data and allowing others to access and analyze the data;6. Encouraging and improving fact-based decision-making.What are the limitations of data warehousing?
  • 3. The major limitations associated with data warehousing are related to userexpectations, lack of data and poor data quality. Building a data warehouse createssome unrealistic expectations that need to be managed. A data warehouse doesntmeet all decision support needs. If needed data is not currently collected, transactionsystems need to be altered to collect the data. If data quality is a problem, theproblem should be corrected in the source system before the data warehouse isbuilt. Software can provide only limited support for cleaning and transforming data.Missing and inaccurate data can not be "fixed" using software. Historical data can becollected manually, coded and "fixed", but at some point source systems need toprovide quality data that can be loaded into the data warehouse without manualclerical intervention.What data is stored in a data warehouse?In general, organized data about business transactions and business operations isstored in a data warehouse. But, any data used to manage a business or any type ofdata that has value to a business should be evaluated for storage in the warehouse.Some static data may be compiled for initial loading into the warehouse. Any datathat comes from mainframe, client/server, or web-based systems can then beperiodically loaded into the warehouse. The idea behind a data warehouse is tocapture and maintain useful data in a central location. Once data is organized,managers and analysts can use software tools like OLAP to link different types ofdata together and potentially turn that data into valuable information that can be usedfor a variety of business decision support needs, including analysis, discovery,reporting and planning. Database administrators (DBAs) have always said thathaving non-normalized or de-normalized data is bad.What are the methodologies of Data Warehousing?Every company has methodology of their own. But to name a few SDLCMethodology, AIM methodology are sturdily used. Other methodologies are AMM,World class methodology and many more.How does my company get started with data warehousing?Build one! The easiest way to get started with data warehousing is to analyze someexisting transaction processing systems and see what type of historical trends andcomparisons might be interesting to examine to support decision making. See ifthere is a "real" user need for integrating the data. If there is, then IS/IT staff candevelop a data model for a new schema and load it with some current data and startcreating a decision support data store using a database management system(DBMS). Find some software for query and reporting and build a decision supportinterface thats easy to use. Although the initial data warehouse/data-driven DSSmay seem to meet only limited needs, it is a "first step". Start small and build moresophisticated systems based upon experience and successes.
  • 4. What is the Data warehouse Implementation Schemes?What type of Indexing mechanism do we need to use for a typical datawarehouse?On the fact table it is best to use bitmap indexes. Dimension tables can use bitmapand/or the other types of clustered/non-clustered, unique/non-unique indexes.To my knowledge, SQLServer does not support bitmap indexes. Only Oraclesupports bitmaps.What are the steps to build the data warehouse?Gathering business requirementsIdentifying SourcesIdentifying FactsDefining DimensionsDefine AttributesRedefine Dimensions & AttributesOrganize Attribute Hierarchy & Define RelationshipAssign Unique IdentifiersAdditional conventions: Cardinality/Adding ratiosHow often should data be loaded into a data warehouse from transactionprocessing and other source systems?It all depends on the needs of the users, how fast data changes and the volume ofinformation that is to be loaded into the data warehouse. It is common to scheduledaily, weekly or monthly dumps from operational data stores during periods of lowactivity (for example, at night or on weekends). The longer the gap between loads,the longer the processing times for the load when it does run. A technical IS/ITstaffer should make some calculations and consult with potential users to develop aschedule to load new data.What are the different architectures of data warehouse? ® What are thedifferent approaches of a Data warehouse?There are two main things
  • 5. Top down - (bill Inmon)Bottom up - (Ralph Kimball)What are the types of a data warehouse?What is the main difference between Inmon and Kimball philosophies of datawarehousing?Both differed in the concept of building the data warehouse.Kimball views data warehousing as a constituency of data marts. Data marts arefocused on delivering business objectives for departments in the organization. Andthe data warehouse is a conformed dimension of the data marts. Hence a unifiedview of the enterprise can be obtained from the dimension modeling on a localdepartmental level.Inmon beliefs in creating a data warehouse on a subject-by-subject area basis.Hence the development of the data warehouse can start with data from the onlinestore. Other subject areas can be added to the data warehouse as their needs arise.Point-of-sale (POS) data can be added later if management decides it is necessary.i.e., Kimball--First Data Marts--Combined way ---Data warehouse Inmon---First Data warehouse--Later----Data martsWhen should I consider a Data warehouse solution?What is the process of warehousing data?Explain the architecture of a data warehouse with the diagram.What is Staging Area?What is a general purpose scheduling tool?The basic purpose of the scheduling tool in a DW Application is to stream line theflow of data from Source to Target at specific time or based on some condition.What is real time data warehousing?Real-time data warehousing is a combination of two things:1. real-time activity and2. Data warehousing.Real-time activity is activity that is happening right now. The activity could beanything such as the sale of widgets. Once the activity is complete, there is dataabout it. Data warehousing captures business activity data. Real-time datawarehousing captures business activity data as it occurs. As soon as the businessactivity is complete and there is data about it, the completed activity data flows into
  • 6. the data warehouse and becomes available instantly. In other words, real-time datawarehousing is a framework for deriving information from data as the data becomesavailable.What is ODS?ODS means Operational Data Store. A collection of operation or bases data that isextracted from operation databases and standardized, cleansed, consolidated,transformed, and loaded into enterprise data architecture. An ODS is used to supportdata mining of operational data, or as the store for base data that is summarized fora data warehouse. The ODS may also be used to audit the data warehouse toassure summarized and derived data is calculated properly. The ODS may furtherbecome the enterprise shared operational database, allowing operational systemsthat are being reengineered to use the ODS as there operation databases.What is Active data warehousing?An active data warehouse provides information that enables decision-makers withinan organization to manage customer relationships nimbly, efficiently and proactively.Active data warehousing is all about integrating advanced decision support with day-to-day-even minute-to-minute-decision making in a way that increases quality ofthose customer touches which encourages customer loyalty and thus secure anorganizations bottom line. The marketplace is coming of age as we progress fromfirst-generation "passive" decision-support systems to current- and next-generation"active" data warehouse implementations.® Active Data ware house means every user can access the database any time 24/7that is called Active DWH.® Active Transformation means data can change and pass.What is meant by OLTP?OLTP stands for On-Line Transaction Processing. This is a standard, normalizeddatabase structure. OLTP is designed for Transactions i.e., day-to-day transactions.OLTP database has hundreds of users connected to it. These databases arenormalized to reduce the redundancy of the data & increase the performance whileinserting the data. The ratio of no. of records being inserted is more than the ration ofno. of records being updated or deleted. OLTP systems are not designed foranalysis, reporting and decision support. Examples: ATM Machines, OnlineShopping, Online Application Filling, and Online Railway Reservations.Why OLTP database are designs not generally a good idea for a DataWarehouse?
  • 7. Since in OLTP, tables are normalized and hence query response will be slow for enduser and OLTP doesn’t contain years of data and hence cannot be analyzed.Why is de-normalized data now ok when its used for Decision Support?Normalization of a relational database for transaction processing avoids processinganomalies and results in the most efficient use of database storage. A datawarehouse for Decision Support is not intended to achieve these same goals. ForData-driven Decision Support, the main concern is to provide information to the useras fast as possible. Because of this, storing data in a de-normalized fashion,including storing redundant data and pre-summarizing data, provides the bestretrieval results. Also, data warehouse data is usually static so anomalies will notoccur from operations like add, delete and update a record or field.Why should you put your data warehouse on a different system than yourOLTP system?A OLTP system is basically “data oriented” (ER model) and not “Subject oriented"(Dimensional Model) .That is why we design a separate system that will have asubject oriented OLAP system...Moreover if a complex query is fired on a OLTPsystem will cause a heavy overhead on the OLTP server that will affect the day-to-day business directly.What is Business Intelligence?Business intelligence (BI) is a broad category of applications and technologies forgathering, storing, analyzing, and providing access to data to help enterprise usersmake better business decisions.What are the important concerns of OLTP and DSS systems? OLTP DSSNo. of users Many FEW
  • 8. Data 1. Stored in a Complex data format. 1. Stored in multidimensional structures (Normalized) e.g.: cube (3 dimensional). 2. Stored in a normalized form. Normally 3rd Normalized form. Normalization enhances 2. Stored in de-normalized format. performance. 3. Large volumes of data. 3. Small volumes of data. 4. Static in nature with periodic 4. Data is volatile in nature. loads.Operations Transactions. Reporting.Indexes Few Many.Joins Many(because it is normalized) Few (because it is de-normalized).Performanc Concurrency and availability are Response time is most imp.e more imp aspects. e.g.: ATMs.OLTP DSSComplex Data Multidimensional DataStructures StructuresFew INDEXES ManyMany JOINS SomeNormalized DBMS DUPLICATED DATA De-Normalized DBMSRare DERIVED DATA AND Common AGGREGATESMany NUMBER OF USERS Few
  • 9. Predefined WORKLOAD AD-HOC queriesoperationsVolatile DATA MODIFICATIONS Update on a regular basisSmall Volumes DATA Large Volume (Historical Data)Availability Must be Response time must behigh goodWhat is the difference between ODS and OLTP?ODS: It is nothing but a collection of tables created in the Data warehouse thatmaintains only current data where as OLTP maintains the data only for transactions,these are designed for recording daily operations and transactions of a business® ODS: Having data with Data warehouse that will be stand alone. No furthertransaction will take place for current data which is part of the data ware house.Current data will be change once you upload through ETL on schedule basis.OLTP: Having data with on line system which connected to network and all updateon transaction happened in seconds. Every second data summarized value will getchanged.What is an OLAP? What are the types of OLAP?OLAP is software for manipulating multidimensional data from a variety of sources.The data is often stored in data warehouse. OLAP software helps a user createqueries, views, representations and reports. OLAP tools can provide a "front-end" fora data-driven DSS.® OLAP: On-Line Analytical Processing: On-Line Analytical Processing (OLAP) isa category of software technology that enables analysts, managers and executivesto gain insight into data through fast, consistent, interactive access to a wide varietyof possible views of information that has been transformed from raw data to reflectthe real dimensionality of the enterprise as understood by the user.® OLAP stands for On-Line Analytical Processing. OLAP system stores data inmultidimensional databases. U then accesses these databases to perform financialand statistical analysis on different combinations of the data. An OLAP database isgenerally used to analyze data. It is optimized so that u can quickly retrieve data. AnOLAP database is generally created from the information u have put in an OLTPdatabase. OLAP products can be grouped into 3 categories.
  • 10. MOLAP: (Multidimensional OLAP)o Data is stored multidimensional arrays in order to be viewed in a multidimensional manner.o Multidimensional arrays provide efficiency in storage and operations.o Examples: ORACLE Express Servers, Essbase by Hyperion Software, Power play by Cognos.o MOLAP does not support ad-hoc queries because it is optimized for multidimensional operationso Retrieval is Fasto Storage is very efficientROLAP: (Relational OLAP)o Data is stored in a Relational model because OLAP capabilities are best provided against the relational database.o Examples: Oracle, SQL Server… etc.o ROLAP integrates naturally with existing technology and standards.o ROLAP can readily take advantage of parallel relational technology.HOLAP: (Hybrid OLAP)o These products combine MOLAP and ROLAP.o With HOLAP products, a relational database stores most of the data.o A separatable multidimensional database stores a small portion of the dataoIs OLAP databases are called decision support system??? True/false?TrueWhat does the term ‘Metadata’ mean?Very loosely, it is documentation about data; it is how you provide context for datapeople might be using. Metadata is basically the wrapping you put around data youuse in everyday life to transform it into meaningful information.What is the difference between data warehousing and OLAP?The term’s data warehousing and OLAP are often used interchangeably. As thedefinitions suggest, warehousing refers to the organization and storage of data froma variety of sources so that it can be analyzed and retrieved easily. OLAP deals withthe software and the process of analyzing data, managing aggregations, andpartitioning information into cubes for in-depth analysis, retrieval and visualization.Some vendors are replacing the term OLAP with the term’s analytical software andbusiness intelligence.® Data warehouse is the place where the data is stored for analyzing where asOLAP is the process of analyzing the data, managing aggregations, partitioninginformation into cubes for in-depth visualization.What is OLAP, MOLAP, ROLAP, DOLAP, and HOLAP?
  • 11. OLAP - On-Line Analytical Processing: Designates a category of applications andtechnologies that allow the collection, storage, manipulation and reproduction ofmultidimensional data, with the goal of analysis.MOLAP - Multidimensional OLAP: This term designates a Cartesian data structuremore specifically. In effect, MOLAP contrasts with ROLAP. In the former, joinsbetween tables are already suitable, which enhances performances. In the latter,joins are computed during the request. Targeted at groups of users because its ashared environment. Data is stored in an exclusive server-based format. It performsmore complex analysis of data.ROLAP - Relational OLAP: Designates one or several star schemas stored inrelational databases. This technology permits multidimensional analysis with datastored in relational databases. Used for large departments or groups because itsupports large amounts of data and users.DOLAP - Desktop OLAP: Small OLAP products for local multidimensional analysisDesktop OLAP. There can be a mini multidimensional database (using PersonalExpress), or extraction of a data cube (using Business Objects). Designed for low-end, single, departmental user. Data is stored in cubes on the desktop. Its likehaving your own spreadsheet. Since the data is local, end users dont have to worryabout performance hits against the server.HOLAP: Hybridization of OLAP, which can include any of the above.What is meant by metadata in context of a Data warehouse and how it isimportant?Meta data is the data about data; Business Analyst or data modeler usually captureinformation about data - the source (where and how the data is originated), nature ofdata (char, varchar, nullable, existence, valid values etc) and behavior of data (how itis modified / derived and the life cycle) in data dictionary a.k.a metadata. Metadata isalso presented at the Data mart level, subsets, fact and dimensions, ODS etc. For aDW user, metadata provides vital information for analysis / DSS.What is difference between MOLAP, ROLAP? ROLAP MOLAPTactical Strategic • Detailed Data • Summary Data • Simple calculations • Complex • Analyze past trends • Predict future trends
  • 12. Data storage structure Data storage structure • Tables • CubeAdvantages Advantages • Requires less memory storage • Data access is faster space DisadvantagesDisadvantages • Requires more memory storage • Data access is slow space. • Is sparsely filled as the number of dimensions in the cube increasesWhat is the Difference between OLTP and OLAP?Main Differences between OLTP and OLAP are:-1. User and System OrientationOLTP: customer-oriented, used for data analysis and querying by clerks, clients andIT professionals.OLAP: market-oriented, used for data analysis by knowledge workers (managers,executives, analysis).2. Data ContentsOLTP: manages current data, very detail-oriented.OLAP: manages large amounts of historical data, provides facilities forsummarization and aggregation, stores information at different levels of granularity tosupport decision making process.3. Database DesignOLTP: adopts an entity relationship(ER) model and an application-oriented databasedesign.OLAP: adopts star, snowflake or fact constellation model and a subject-orienteddatabase design.4. ViewOLTP: focuses on the current data within an enterprise or department.OLAP: spans multiple versions of a database schema due to the evolutionaryprocess of an organization; integrates information from many organizationallocations and data stores
  • 13. What types of Metadata are there and when will they be available?Metadata will be made available on the Decision Support website as each incrementgoes live. We have two classifications of metadata: one that is business and onethat is technical. Technical metadata is fairly clear-cut: where did the data come fromor how was it transformed along the way? Business metadata deals more with thepossible meaning of the data and how it can be used.Why is Metadata important to the DWH User?Metadata is what makes the data in the Data Warehouse meaningful. The DataWarehouse is very different from an operational application. When youre using anoperational application, you can get clues from the screen that tells you to update aparticular field on the window. If I’m processing a new employee, I know exactly whatneeds to be updated for that new employee record, and can move through theprocess based on the context that the application provides. In a data-warehousingenvironment, you don’t have that context or workflow. You have data that isinterrelated, and it is raw out there in a form, but there is no application between youand the data. Basically, you have a number of tables and structures that you haveaccess to without a business layer, without a definition on top of it. So metadata isvery important to be able to provide that context to people so they know how to gobetween subject areas or how data within a subject area is related and what itdefines and represents.Is Metadata a description of what the data represents?In the simplest terms it is. As an example, if a user of the Data Warehouse isinterested in a field called "campus code", then the metadata might have a definitionof what the campus code represents, such as "an indicator for one of the threecampuses". That is a form of metadata, although it is not a complete picture of whatmetadata can be.What types of Metadata will be made available to the User?Decision Support has identified several kinds of metadata that will be published onthe website. Some basic categories are the data model, source-to-target mapping,and the logical & physical model. The logical model gives more of a grouping oridentifies logically what would be expected from the business side. The physicalmodel goes into more detail with more of the data dictionary definition, but it givesthe user a pictorial representation of the data, not just a list of columns and tables. Itprovides a visual so people can see how data elements relate to each other. There isalso a category of metadata that we call usage notes. These go into expanding onhow someone might query the Data Warehouse or use a query against a data mart.Based on going through the requirements process and working with the focusgroups, as data is available, we expect to expand the metadata categories.
  • 14. Is Metadata also useful to the average User of the DWH, in addition to adepartment’s technical staff?Yes. For an "ad hoc" user, there may be questions as to what a field represents.Another form of metadata at a business user level would be sample queries thatDecision Support’s Services area would publish based on findings from therequirements process and focus groups. These queries provide samples of relatingdata to answer a business question.What Challenges are involved when providing Metadata?Historically organizations find it a challenge to manage metadata over time. So Ithink the biggest challenge that we face at Decision Support is learning from thosemistakes and from what we’ve read in the industry. We need to make sure themetadata we have is ‘live’; that it’s not something that is static and put on the shelf.Decision Support has formed a Custodial Data Council that will take ownership inmaking sure we have business definitions and work with the user community. I thinkwe also need to technically streamline those processes as much as possible, publishthe metadata, and make it as consistent as possible.What is the difference between DWH and BI?There may be a Feature film (movie) without a Trailer. But there will be no trailerwithout a movie. Similarly Data warehousing is a concept related to extracting clientsbusiness data and applying business processing features on that data according touser needs and finally loading the processed data into a database, this database iswhat we call a warehouse or data warehouse. After the completion of a datawarehouse the business user ultimately want to view his data (a precise andsummary data) but as a business person he may dont have knowledge of accessinga database (a computer person can access the database with SQL). So there comesOLAP tools (which help that person to access the database) we can call these OLAPtools as Business Intelligence tools (Intelligence in sense they generate SQL queriesinternally and provide lot of facilities and privileges for a reporting developers informatting the data and presenting it in a highly convenient manner). So datawarehouse (movie) is a database and business intelligence tools (trailers) presentthe content of a database in an efficient manner.® Simply speaking, BI is the capability of analyzing the data of a data warehouse inadvantage of that business. A BI tool analyzes the data of a data warehouse and tocome into some business decision depending on the result of the analysis.® Data warehouses deals with all aspects of managing the development,implementation and operation of a data warehouse or data mart including meta datamanagement, data acquisition, data cleansing, data transformation, storagemanagement, data distribution, data archiving, operational reporting, analyticalreporting, security management, backup/recovery planning, etc. Business
  • 15. intelligence, on the other hand, is a set of software tools that enable an organizationto analyze measurable aspects of their business such as sales performance,profitability, operational efficiency, effectiveness of marketing campaigns, marketpenetration among certain customer groups, cost trends, anomalies and exceptions,etc. Typically, the term “business intelligence” is used to encompass OLAP, datavisualization, data mining and query/reporting tools. Think of the data warehouse asthe back office and business intelligence as the entire business including the backoffice. The business needs the back office on which to function, but the back officewithout a business to support, makes no sense.® DATAWAREHOUSE: Data warehouse is integrated, time-variant, subject orientedand non-volatile collection data in support of management decision making process.BUSINESS INTELLIGENCE: Business Intelligence is the process of extracting thedata, converting it into information and then into knowledge base is known asBusiness Intelligence.® A data warehouse is a database geared towards the business intelligencerequirements of an organization. It integrates data from the various operationalsystems and is typically loaded from these systems at regular intervals.BI - It is category of technologies that allows for gathering, storing, accessing andanalyzing data to help business users make better decisions.® To make Business Analysis effective and efficient we require specialized form ofstorage. This special form of storage of data is called Data Warehouse and theprocess Data Warehousing.Business Intelligence, is the mechanism of using data according to type of industryfor predictive analysis, fault findings, process improvement etc.What is a Data Dictionary?A data dictionary is a kind of metadata. A data dictionary explains how dataphysically resides in an environment. A data dictionary identifies the type of column itis, whether it is character or numeric or some other value. It identifies the width of acolumn as well as the name of the column. Sometimes in data dictionaries you seedescriptions; sometimes you don’t. But basically it is how that field is physicallyrepresented in Oracle or Sybase or some other platform, if that’s where the dataresides. Its difficult to do any meaningful query or report without basic metadata.What are the possible data marts in Retail sales?Product information, sales information.What are data validation strategies for data mart validation after loadingprocess?
  • 16. Data validation is to make sure that the loaded data is accurate and meets thebusiness requirements.Strategies are different methods followed to meet the validation requirements.What is a Data Mart?A Data Mart is a focused subset of a DWH that deals with a single area of data andis organized for quick analysis. It contains the summarized data of the warehousesand is referred as High Performance Query Structures. They consist ofMaterialized Views and Special Indexes. In some businesses these data marts maybe maintained within the warehouses whereas, in some other scenario’s they maybe maintained apart from the DWH’s.® A data mart is a repository of data gathered from operational data and othersources that is designed to serve a particular community of knowledge workers.® The systems designed for a particular line of business.What are Data Marts?Data Marts are designed to help manager make strategic decisions about theirbusiness. Data Marts are subset of the corporate-wide data that is of value to aspecific group of users.There are two types of Data Marts:1. Independent data marts – sources from data captured form OLTP system,external providers or from data generated locally within a particular department orgeographic area.2. Dependent data mart – sources directly form enterprise data warehouses.What are the levels of Data mart?What are the difference between Database, DATAWAREHOUSE and DataMarts?A Database is an organized collection of data.A DWH is a very large database with special set of tools to extract and cleanse datafrom operational systems and to analyze data.A Data Mart is a focused subset of a DWH that deals with a single area of data andis organized for quick analysis.What is Data Sampling?What is Data Scrubbing?
  • 17. What is Data Acquisition Process?What is data mining?Data mining is a process of extracting hidden trends within a data warehouse. Forexample an insurance data warehouse can be used to mine data for the most highrisk people to insure in a certain geographical area.What is a transformation?It is a repository object that generates, modifies or passes data.Transformations: Transformations are the manipulation of data from how it appearsin the source systems into another form in the DWH or data mart in a way thatenhances or simplifies its meaning. In another way, you transform data intoinformation. This includes the following:Data Merging: It is a process of standardizing data types and fields. Suppose onesource system calls integer type data as smallint whereas another calls same dataas decimal. The data from the two source systems needs to rationalize when movedinto the oracle data format called number.Cleansing: It is the process of validating the data brought from multiple sources.This involves identifying any changing inconsistencies or inaccuracies.• Eliminating inconsistencies in the data from multiple sources.• Converting data from different systems into single consistent data set suitable for analysis.• Meets a standard for establishing data elements, codes, domains, formats and naming conventions.• Correct data errors and fills in for missing data values.Aggregation: The process where by multiple detailed values are combined into asingle summary value typically summation numbers representing dollars spend orunits sold.Generate summarized data for use in aggregate fact and dimension tables.What are the advantages of data mining over traditional approaches?Data Mining is used for the estimation of future. For example, if we take acompany/business organization, by using the concept of Data Mining, we can predictthe future of business in terms of Revenue (or) Employees (or) Customers (or)Orders etc.Traditional approaches use simple algorithms for estimating the future. But, it doesnot give accurate results when compared to Data Mining.What is ETL?
  • 18. ETL stands for extraction, transformation and loading.ETL provide developers with an interface for designing source-to-target mappings,transformation and job control parameter.• Extraction: Take data from an external source and move it to the warehouse pre-processor database.• Transformation: Transform data task allows point-to-point generating, modifying and transforming data.• Loading: Load data task adds records to a database table in a warehouse.Explain the classification of Tables in a Data warehouse?What is Fact table?Fact Table contains the measurements or metrics or facts of business process. Ifyour business process is "Sales”, then a measurement of this business process suchas "monthly sales number" is captured in the Fact table. Fact table also contains theforeign keys for the dimension tables.Why fact table is in normal form?Basically the fact table consists of the Index keys of the dimension/look up tablesand the measures. So when ever we have the keys in a table. That itself implies thatthe table is in the normal form.What is a level of Granularity of a fact table?Level of granularity means level of detail that you put into the fact table in a datawarehouse. For example: Based on design you can decide to put the sales data ineach transaction. Now, level of granularity would mean what detail you are willing toput for each transactional fact. Product sales with respect to each minute or youwant to aggregate it up to minute and put that data.What does level of Granularity of a fact table signify?Granularity: The first step in designing a fact table is to determine the granularity ofthe fact table. By granularity, we mean the lowest level of information that will bestored in the fact table. This constitutes two steps:Determine which dimensions will be included.Determine where along the hierarchy of each dimension the information will be kept.The determining factors usually go back to the requirementsWhat is aggregate fact table?
  • 19. Aggregate table contains the [measure] values, aggregated /grouped/summed up tosome level of hierarchy.What is fact less fact table? Where you have used it in your project?Factless table means only the key available in the Fact there is no measuresavailable.What is the common use of creating a Factless Fact Table?What are the different types of Fact Table? Explain with an example.1. Cumulative Fact Table:2. Snapshot Fact Table:What are the types of Facts?Additive: A Fact that can be summed up with any of the dimensions is called AdditiveFacts.® A measure can participate arithmetic calculations using all or any dimensions. Ex:Sales profitSemi additive: A Fact that can be summed up with some of the dimensions is calledSemi-additive Facts.® A measure can participate arithmetic calculations using some dimensions. Ex:Sales amountNon Additive: A Fact that can be summed up with none of the dimensions is calledNon-additive Facts.® A measure can’t participate arithmetic calculations using dimensions. Ex:temperatureWhat are Semi-additive and factless facts and in which scenario will you usesuch kinds of fact tables?Snapshot facts are semi-additive, while we maintain aggregated facts we go forsemi-additive. EX: Average daily balanceA fact table without numeric fact columns is called factless fact table. Ex: PromotionFacts
  • 20. While maintain the promotion values of the transaction (ex: product samples)because this table doesn’t contain any measures.What are non-additive facts in detail?A fact may be measure, metric or a dollar value. Measure and metric are nonadditive facts.Dollar value is additive fact. If we want to find out the amount for a particular placefor a particular period of time, we can add the dollar amounts and come up with thetotal amount.A non additive fact, for e.g. measure height(s) for citizens by geographical location ,when we rollup city data to state level data we should not add heights of thecitizens rather we may want to use it to derive count.What is conformed fact?Conformed dimensions are the dimensions which can be used across multiple DataMarts in combination with multiple facts tables accordingly.What is a continuously valued fact?What is Centipede Fact Table?What is Fact Constellation?What are the categories of Snapshot Fact Table Grains?What is a dimension table?A dimensional table is a collection of hierarchies and categories along which the usercan drill down and drill up. It contains only the textual attributes.How are the Dimension tables designed?Most dimension tables are designed using Normalization principles up to 2NF. Insome instances they are further normalized to 3NF.Find where data for this dimension are located.Figure out how to extract this data.Determine how to maintain changes to this dimension (see more on this in the nextsection).Change fact table and DW population routines.What are the Different methods of loading Dimension tables?
  • 21. Conventional Load: Before loading the data, all the Table constraints will be checkedagainst the data.Direct load: (Faster Loading) All the Constraints will be disabled. Data will be loadeddirectly. Later the data will be checked against the table constraints and the bad datawont be indexed.Can a dimension table contain numeric values?What is hierarchy relationship in a dimension? Whether it is:1. 1:12. 1: m3. M: mWhat are the different types of dimensions? Explain with examples.1. Regular Dimensions2. Shared dimensionsWhat are the different types of dimension tables? Explain with examples.Why dimensions are de-normalized in nature?Can 2 fact tables share same dimension tables?What is junk dimension?Junk dimension: Grouping of Random flags and text attributes in a dimension andmoving them to a separate sub dimension.® A dimension, which does not change the grain level, is called junk dimension.Grain- lowest level of reporting.(Or) The junk dimension is simply a structure that provides a convenient place tostore the junk attributes(Or) A junk dimension is a convenient grouping of flags and indicators.What are Conformed Dimensions?A dimension that is used in more than one cube.® The use of conformed dimensions and shared measures is the primary way a setof data marts can be united into one consolidated data warehouse.® Conformed dimensions are dimensions which are common to the cubes.(cubesare the schemas contains facts and dimension tables)
  • 22. Consider Cube-1 contains F1, D1, D2, D3 and Cube-2 contains F2, D1, D2, D4 arethe Facts and Dimensions. Here D1,D2 are the Conformed Dimensions® Conformed dimensions mean the exact same thing with every possible fact tableto which they are joined. Ex: Date Dimensions is connected all facts like Sales facts,Inventory facts. EtcWhat is degenerated dimension?Degenerate Dimension: Keeping the control information on Fact table ex: Considera Dimension table with fields like order number and order line number and have 1:1relationship with Fact table, In this case this dimension is removed and the orderinformation will be directly stored in a Fact table in order eliminate unnecessary joinswhile retrieving order information.What is degenerate dimension table?Degenerate Dimensions: If a table contains the values, which r neither dimensionnor measures is called degenerate dimensions. Ex: invoice id, empno.What is Audit dimension? Explain with an example.What is a Fact Dimension?What is a Mini Dimension?What are Role-playing dimensions?What is a Mystery Dimension?How do you connect the facts and dimensions in the tables?1. Smart Matching columns2. Manually you can linkWhich columns go to the fact table and which columns go the dimensiontable?The Primary Key columns of the Tables (Entities) go to the Dimension Tables asForeign Keys.The Primary Key columns of the Dimension Tables go to the Fact Tables as ForeignKeys.What is Associate Table?What is Bridge Table?What is crass reference table?
  • 23. What is Event-Tracking Table?What is a lookup table?A lookup table is the one which is used when updating a warehouse. When thelookup is placed on the target table (fact table / warehouse) based upon the primarykey of the target, it just updates the table by allowing only new records or updatedrecords based on the lookup condition.What is the data type of the surrogate key?Data type of the surrogate key is either integer or numeric or number.What is a Schema?What is a Star Schema?Star schema is a type of organizing the tables such that we can retrieve the resultfrom the database easily and fastly in the warehouse environment. Usually a starschema consists of one or more dimension tables around a fact table which lookslike a star, so that it got its name.Differences between star and snowflake schemas?Star schema: A single fact table with N number of Dimension.Snowflake schema: Any dimensions with extended dimensions are known assnowflake schema.® Star schema - all dimensions will be linked directly with a fat table.Snow schema - dimensions maybe interlinked or may have one-to-many relationshipwith other tables.What is Snow-Flake Schema?When do U go for Star Schema? & when do U go for Snow-Flake Schema?What is the main difference between schema in RDBMS and schemas in DataWarehouse?RDBMS Schema
  • 24. • Used for OLTP systems• Traditional and old schema• Normalized• Difficult to understand and navigate• Cannot solve extract and complex problems• Poorly modeledDWH Schema• Used for OLAP systems• New generation schema• De Normalized• Easy to understand and navigate• Extract and complex problems can be easily solved• Very good modelWhy did u choose STAR SCHEMA only? What are the benefits of STARSCHEMA?Because it’s de-normalized structure, i.e., Dimension Tables are de-normalized. Whyto de-normalize means the first (and often only) answer is: speed. OLTP structure isdesigned for data inserts, updates, and deletes, but not data retrieval. Therefore, wecan often squeeze some speed out of it by de-normalizing some of the tables andhaving queries go against fewer tables. These queries are faster because theyperform fewer joins to retrieve the same record set. Joins are also confusing to manyEnd users. By de-normalizing, we can present the user with a view of the data that isfar easier for them to understand.Benefits of STAR SCHEMA:Far fewer Tables.Designed for analysis across time.Simplifies joins.Less database space.Supports “drilling” in reports.Flexibility to meet business and technical needs.Difference between Snow flake and Star Schema. What are situations whereSnow flake Schema is better than Star Schema to use and when the oppositeis true?
  • 25. Star schema contains the dimension tables mapped around one or more fact tables.It is a denormalised model. No need to use complicated joins. Queries results fastly.Snowflake schema: It is the normalized form of Star schema. It contains in-depthjoins, because the tables r splitted in to many pieces. We can easily do modificationdirectly in the tables. We have to use complicated joins, since we have moretables .There will be some delay in processing the Query.Which is preferable? Star Schema or Snow-Flake Schema?If U have 2 fact tables connected in the schema, do U know the name of theschema?What is Galaxy Schema?What is Multi-Star Schema?How do you load the time dimension?Time dimensions are usually loaded by a program that loops through all possibledates that may appear in the data. It is not unusual for 100 years to be representedin a time dimension, with one row per day.What are slowly changing dimensions?SCD stands for Slowly changing dimensions. Slowly changing dimensions are ofthree typesSCD1: only maintained updated values.Ex: a customer address modified we update existing record with new address.SCD2: maintaining historical information and current information by usingA) Effective DateB) VersionsC) Flags Or combination of theseSCD3: by adding new columns to target table we maintain historical information andcurrent information® Type-1: Most Recent ValueType-2(full History)i) Version Numberii) Flag
  • 26. iii) DateType-3: Current and one Previous value® Type 1: overwrite data is to be there.Type 2: current, recent and history data should be there.Type 3: current and recent data should be there.What is BUS Schema?BUS Schema is composed of a master suite of confirmed dimension andstandardized definition if facts.What is hybrid slowly changing dimension?What are Critical columns?What is a surrogate key? Why is it used? What is its need? Give an example.Explain in detail what do you mean by Slicing and Dicing?Slicing and dicing refers to the ability to combine and re-combine the dimensions tosee different slices of the information. Picture slicing a three-dimensional cube ofinformation, in order to see what values are contained in the middle layer. Dicing isthe ability to view the cube from different perspectives. Slicing and dicing a cubeallows an end-user to do the same thing with multiple dimensions.What is a Measure? What are the types of Measures?How can U create Measures & Dimensions?Can we group a measure?What do U mean by Multi-dimensional Analysis?What is a Grain?What is Drill-up, Drill-down & Drill-Across?Differentiate between Level and Category?Level is a logical subdivision of a dimensione.g.: if orderdate is a dimension, the levels are year, quarter, month, week, day etc.Category is the different instances of a levelE.g. if year is a level, the category are 1996, 1997, 1998 etc.What is a CUBE in data warehousing concept?
  • 27. Cubes are logical representation of multidimensional data. The edge of the cubecontains dimension members and the body of the cube contains data values.What is a Virtual Cube?Difference between filter and condition?Parameter is the only difference® The difference between Filter and Condition: Condition returns true or false Ex: ifCountry = India then ...Filter will return two types of results.1. Detail information which is equal to where clause in SQL statement2. Summary information which is equal to Group by and having clause in SQLstatement® I filter we just create a parameter on which we can filter the fields. but in conditionwe can have the static functions like if yes then color it green, if no then color it asred etc. so here we can create conditions for filtering in the report. Mean we canmake different filtering function at the same time by using conditional formatting.What is snapshot?You can disconnect the report from the catalog to which it is attached by saving thereport with a snapshot of the data. However, you must reconnect to the catalog if youwant to refresh the data.What is a linked cube?Linked cube in which a sub-set of the data can be analyzed into great detail. Thelinking ensures that the data in the cubes remain consistent.What is VLDB?VLDB stands for Very Large Database.It is an environment or storage space managed by a relational databasemanagement system (RDBMS) consisting of vast quantities of information. VLDBdoesn’t refer to size of database or vast amount of information stored. It refers to thewindow of opportunity to take back up the database.Window of opportunity refers to the time of interval and if the DBA was unable totake back up in the specified time then the database was considered as VLDB.What is batch processing?What is incremental loading?
  • 28. Incremental loading means loading the ongoing changes in the OLTP.Explain the advantages of RAID 1, 1/0, and 5. What type of RAID setup wouldyou put your TX logs.Transaction logs write sequentially and dont need to be read at all. The ideal is tohave each on RAID 1/0 because it has much better write performance than RAID 5.RAID 1 is also better for TX logs and costs less than 1/0 to implement. It has a tadless reliability and performance is a little worse generally speaking.RAID 5 is best for data generally because of cost and the fact it provides great readcapability.What is BAS? What is the function?The Business Application Support (BAS) functional area at SLAC providesadministrative computing services to the Business Services Division and HumanResources Department. We are responsible for software development andmaintenance of the PeopleSoft applications and consultation to customers with theircomputer-related tasks. It’s called Broadcast Agent Server. Its function is to run thejobs or reports scheduled and can be monitored using Broadcast Agent Console.What are modeling tools available in the Market?There are a number of data modeling toolsTool Name Company NameErwin Computer AssociatesEmbarcadero Embarcadero TechnologiesRational Rose IBM CorporationPower Designer Sybase CorporationOracle Designer Oracle CorporationWhat are the various Reporting tools in the Market?1. MS-Excel2. Business Objects (Crystal Reports)3. Cognos (Impromptu, Power Play)4. Microstrategy5. MS reporting services
  • 29. 6. Informatica Power Analyzer7. Actuate8. Hyperion (BRIO)9. Oracle Express OLAP10. Proclarity® Some of the standard Business Intelligence tools in the market According to theirperformance1) MICROSTRATEGY2) BUSINESS OBJECTS, CRYSTAL REPORTS3) COGNOS REPORT NET4) MS-OLAP SERVICESOr1. Seagate Crystal report2. SAS3. Business objects4. Microstrategy5. Cognos6. Microsoft OLAP7. Hyperion8. Microsoft integrated services and some more.What are the various ETL tools in the Market?Various ETL tools used in market are:Informatica.Data Stage.Oracle Warehouse Builder.Ab Initio.Data Junction.Name some of the real time data-warehousing tools?What is Outsourcing, Offshoring & Insourcing? And what is the differencebetween them.
  • 30. Outsourcing is not strictly IT. Any function of an organization that is executed by non-employees is essentially an Outsourced task.Insourcing is the use of external resources (not employees of the Organization) toaccomplish some function, but they are predominately carrying out the function atthe client’s site. So, the function is “sourced” but not “out” sourced. These resourcesare also typically managed more closely by the client directly with little managementinvolvement from the supplier.Offshoring is a subset of Outsourcing which is generally understood to involve acountry in which cost remain lower than the clients country of operations.While most Offshoring situations are indeed an example of Outsourcing, for thosecompanies (HP for example) who now own their offshore operations and have foldedthem into the company, the line gets blurred. In other words, Offshoring is not alwaysoutsourcing anymore.What is ER Diagram?The Entity-Relationship (ER) model was originally proposed by Peter in 1976[Chen76] as a way to unify the network and relational database views. Simply statedthe ER model is a conceptual data model that views the real world as entities andrelationships. A basic component of the model is the Entity-Relationship diagramwhich is used to visually represent data objects.Since Chen wrote his paper the model has been extended and today it is commonlyused for database design for the database designer, the utility of the ER model is:It maps well to the relational model. The constructs used in the ER model can easilybe transformed into relational tables.It is simple and easy to understand with a minimum of training. Therefore, the modelcan be used by the database designer to communicate the design to the end user.In addition, the model can be used as a design plan by the database developer toimplement a data model in specific database management software.What Oracle tools can be used to build and design a warehouse?What Oracle features can be used to optimize my warehouse system?What is Data Modeling?Data modeling represent information in the entities, attributes and relationships.Visual representation of the information.What are the different steps for Data Modeling?1. Define the problem and scope of the problem.
  • 31. 2. Information gathering.3. Analysis(normalization)4. Create a logical data model (independent of platform).5. Decision about physical platform like oracle or SQL etc.6. Create a physical data model, which is platform specific.7. Database creation.What is Dimensional Modeling?Dimensional Modeling is a design concept used by many data warehouse designersto build their data warehouse. In this design model all the data is stored in two typesof tables - Facts table and Dimension table. Fact table contains thefacts/measurements of the business and the dimension table contains the context ofmeasurements i.e., the dimensions on which the facts are calculated. Data modelingis probably the most labor intensive and time consuming part of the developmentprocess. Why bother especially if you are pressed for time? A common response bypractitioners who write on the subject is that you should no more build a databasewithout a model than you should build a house without blueprints. The goal of thedata model is to make sure that the all data objects required by the database arecompletely and accurately represented. Because the data model uses easilyunderstood notations and natural language, it can be reviewed and verified ascorrect by the end-users. The data model is also detailed enough to be used by thedatabase developers to use as a "blueprint" for building the physical database. Theinformation contained in the data model will be used to define the relational tables,primary and foreign keys, stored procedures, and triggers. A poorly designeddatabase will require more time in the long-term. Without careful planning you maycreate a database that omits data required to create critical reports, produces resultsthat are incorrect or inconsistent, and is unable to accommodate changes in theusers requirements.What is Logical Modeling?The Logical Model: In Erwin, the logical model is the version of the model thatrepresents all of the logical business requirements of an organization. There arethree levels of logical models that are used to capture these requirements:The Entity Relationship Diagram A high-level data model that includes all majorentities and relationships. The Entity Relationship Diagram does not contain muchdetail and is often used in the initial planning phase.The Key Based Model A model that describes major data structures such asentities, primary keys, and sample attributes.
  • 32. The Fully Attributed Model A complete model that includes all required entities,attributes, key groups, and relationships.In Erwin, a logical model can be created in conjunction with the physical model, orindependent of the physical model. Logical models can also be derived from othermodels using the Derive Model Wizard.In addition, Erwin supports the definition of model objects in a logical model aslogical only and in a physical model as physical only. These options allow for thelogical model to be fully normalized and for the corresponding physical model to bede-normalized. Erwin also allows for the automatic conversion of many-to-many andsuper type/subtype relationships when you change from a logical model to a physicalmodel.What are the types of Dimensional Modeling?What is Conceptual Modeling?What is Physical Modeling?Comparing Logical and Physical Models in a Logical/Physical Model:In an Erwin logical/physical model, each model that you create automaticallyincludes both a logical and a physical model. By default, the logical model is closelyrelated to the physical model. If you make a change in the logical model, the changeis automatically reflected in the physical model and vice-versa.You can use either the logical model or the physical model to define and documentdatabase structures; although the model you use typically depends on the type ofwork you want to perform. You can use the logical model to represent businessinformation and define business rules in a fully normalized model, while the physicalmodel supports the needs of the database administrator, who focuses on thephysical implementation of the model in a database.Comparing Logical and Physical Model Objects:Most of the objects in the logical model correspond to a related object in the physicalmodel. For example, the logical model contains entities, attributes, and key groups,which are represented in the physical model as tables, columns, and indexes,respectively. The following table compares the logical and physical components inan Erwin model.What is Difference between E-R Modeling and Dimensional Modeling?Basic diff is E-R modeling will have logical and physical model. Dimensional modelwill have only physical model.E-R modeling is used for normalizing the OLTP database design.
  • 33. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.What is Entity, Attribute and Relationship?Entity: Entity is an object of which an organization wants to maintain the informationE.g.: Employee.Attribute: Is an object that maintains the information.Key attribute: A key attribute consists of one or more attributes of an entity, whichuniquely identify the entity. e.g.; Bank account no identifies for account.Relationship: Defines the association between different entities.one to one, one to many, many to one, many to many.What is meant by De-Normalization?What is the definition of normalized and denormalized view and what are thedifferences between them?Normalization is the process of removing redundancies.Denormalization is the process of allowing redundancies.Why Denormalization is promoted in Universe Designing?In a relational data model, for normalization purposes, some lookup tables are notmerged as a single table. In a dimensional data modeling (star schema), thesetables would be merged as a single table called DIMENSION table for performanceand slicing data. Due to this merging of tables into one large Dimension table, itcomes out of complex intermediate joins. Dimension tables are directly joined to Facttables. Though, redundancy of data occurs in DIMENSION table, size ofDIMENSION table is 15% only when compared to FACT table. So onlyDenormalization is promoted in Universe Designing.What is Cardinality?What is Referential Integrity?What are Integrity Constraints?What is the difference between view and materialized view?View - store the SQL statement in the database and let you use it as a table. Everytime you access the view, the SQL statement executes.Materialized view - stores the results of the SQL in table form in the database. SQLstatement only executes once and after that every time you run the query, the storedresult set is used. Pros include quick query results.
  • 34. What is Normalization, First Normal Form, Second Normal Form , Third NormalForm?1. Normalization is process for assigning attributes to entities–Reduces dataredundancies–Helps eliminate data anomalies–Produces controlled redundancies tolink tables2. Normalization is the analysis of functional dependency between attributes / dataitems of user views. It reduces a complex user view to a set of small and stablesubgroups of fields / relations1NF: Repeating groups must be eliminated, Dependencies can be identified, All keyattributes defined, No repeating groups in table2NF: The Table is already in1NF,Includes no partial dependencies–No attributedependent on a portion of primary key, Still possible to exhibit transitive dependency,Attributes may be functionally dependent on non-key attributes3NF: The Table is already in 2NF, Contains no transitive dependencies.What is a Table space? What does it contain?What is a Composite Key or Concatenated Key? What is its use?What are Unique Identifiers?What is an Index? What are the types of Indexes?What do U mean Partitioned Indexes?What is partitioning? What are the methods of partitioning?What is Parallelism?What are the advantages and disadvantages of reporting directly against thedatabase? Do you always need to copy the data before reporting on it?(Example, real-time & on-demand reporting is a requirement)There isn’t any need to copy the data before reporting on as long as the data isclean. But if the data is not clean it should be cleansed and so go for ETLprocess.Adv of reporting directly against the database (OLTP): No need to separatelymaintain a Database for it. (Space consumption is reduced).Disadv of reporting directly against the database (OLTP): It slows down theprocess bcoz OLTP system is designed for the online application but a DataWarehouse application which requires to do analysis and hence takes the same databut takes a long time.
  • 35. What are the most frequent data errors that slow down data input process?Data mining is the process of data selection, exploration and building modelsusing vast data stores to uncover previously unknown patterns. What doesthis mean to you?You can produce new knowledge to better inform decision makers before they act.Build a model of the real world based on data collected from a variety of sources,including corporate transactions, customer histories and demographics, evenexternal sources such as credit bureaus. Then use this model to produce patterns inthe information that can support decision making and predict new businessopportunities. Text mining capabilities enable you to apply such analyses to text-based documents. With SASs rich suite of text processing and analysis tools, youcan uncover underlying themes or concepts contained in large document collections,group documents into topical clusters, classify documents into predefined categoriesand integrate text data with structured data for enriched predictive modelingendeavors.
  • 36. Before you begin, you should know the answers for the following questions. what is Data? D what is a Database? D what is an RDBMS? R What is a Data Model? D Why we follow Normalization while designing data model? What is an OLTP systemWHAT IS A DATAWAREHOUSING: • A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. • In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.• A Data warehouse is a complete set of Subject Oriented Integrated Time variant Nonvolatiledata which helps business in taking organization decisionSubject OrientedData warehouses are designed to help you analyze data. For example, to learn moreabout your companys sales data, you can build a warehouse that concentrates onsales. Using this warehouse, you can answer questions like "Who was our bestcustomer for this item last year?" This ability to define a data warehouse by subjectmatter, sales in this case, makes the data warehouse subject oriented.
  • 37. IntegratedIntegration is closely related to subject orientation. Data warehouses must put datafrom disparate sources into a consistent format. They must resolve such problemsas naming conflicts and inconsistencies among units of measure. When they achievethis, they are said to be integrated.NonvolatileNonvolatile means that, once entered into the warehouse, data should not change.This is logical because the purpose of a warehouse is to enable you to analyze whathas occurred.Time VariantIn order to discover trends in business, analysts need large amounts of data. This isvery much in contrast to online transaction processing (OLTP) systems, whereperformance requirements demand that historical data be moved to an archive. Adata warehouses focus on change over time is what is meant by the term timevariant.When an organization should create a Data Warehouse? Once an organization have too much of information where it becomes too difficult toget the meaning full information for the business to take the strategic decisions. Thedecisions we make using the Data warehousing data will affect the entireorganization instead of one customer or one employee. Example of decisions wemake in DW is, should we continue with the specific product offerings to ourcustomers or not. Should we move the customer support department to a differentlocation for a cost saving, etc etc.Data warehouses and OLTP systems have very different requirements. Here aresome examples of differences between typical data warehouses and OLTP systems: • Workload Data warehouses are designed to accommodate ad hoc queries. You might not know the workload of your data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of possible query operations. OLTP systems support only predefined operations. Your applications might be specifically tuned or designed to support only these operations. • Data modifications
  • 38. A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques. The end users of a data warehouse do not directly update the data warehouse. In OLTP systems, end users routinely issue individual data modification statements to the database. The OLTP database is always up to date, and reflects the current state of each business transaction. • Schema design Data warehouses often use denormalized or partially denormalized schemas (such as a star schema) to optimize query performance. OLTP systems often use fully normalized schemas to optimize update/insert/delete performance, and to guarantee data consistency. • Typical operations A typical data warehouse query scans thousands or millions of rows. For example, "Find the total sales for all customers last month." A typical OLTP operation accesses only a handful of records. For example, "Retrieve the current order for this customer." • Historical data Data warehouses usually store many months or years of data. This is to support historical analysis. OLTP systems usually store data from only a few weeks or months. The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction.END USER OF APPPLICATION:  What you mean by end user in OLTP system ? • An end user is who is entering data or reading a particular report from the system. • For a Bank teller he/she should enter the account number see the balance or deposit the cheque etc • For a customer representative job he/she must see the cust information to be more effective  What kind of information management wants to know, because the DW data is primarily used by management.
  • 39. • Which are our lowest/highest margin customers? • What is the most effective distribution channel? • What product promotions have the biggest impact on revenue? • What impact will new products/services have on revenue and margins? • Which customers are most likely to go to the competition? • Who are my customers and what products are they buying?In OLTP applications, end users are individuals who takes care of day to dayoperations.In DW applications, end users are managers and above who takes decisions basedon the trend, history, predictions etcIf end users are not satisfied with the application, then the product is considered tobe failure even though the technology wise its a great achievement.Data Warehouse Architecture:Source Data:
  • 40. An organization will have many OLTP applications, all these operational databecomes the source for the Data Warehouse database.ETL: (Extract Transform and Load) We extract data from various operational systems and clean the data so that we getonly the information make sense to have in Data Warehouse. While cleansing thedata we may reject some records or we fill in the missing information. Once wetransform the operational data to the format in which DW expects, then we load thedata to DW. This process takes most of the time while developing DW applications.DW Database This is the area where we store the data which is required by the business so thatthey can run any report against the data. In data warehouses we will have currentand history information which is very useful for trend analysis, behavioral analysisetc.What is Data Mart?A data mart is a simple form of a data warehouse that is focused on a single subject(or functional area), such as Sales or Finance or Marketing. Data marts are oftenbuilt and controlled by a single department within an organization. Given their single-subject focus, data marts usually draw data from only a few sources. The sourcescould be internal operational systems, a central data warehouse, or external dataDifference between Data Warehouse and Data Mart Data Warehouse Data Mart D Enterprise-wide Departmental Structure for corporate view of Star Schema based (Facts and data dimensions) d Organized E-R Model or d Quick turn around (up and Galaxy of Star (Multiple Star running as there are less schemas in the Data Model) stakeholders) s Long turn around timeData Granularity
  • 41. What is Granularity of your DW?  Granularity is the level of details we want to store in the data warehouse.  For a retail store, Point of Sale (POS) is the lowest granularity information available.  For banking its the account level details based on every day transactions.  As DSS is learning towards analyzing the data as a whole, not necessarily the data warehouse will have all the details up to daily transactions.t Daily sales by date, product and customer Weekly sales by product and customer Monthly sales by product and customer Quarterly sales by product and customer Yearly sales by product and customer Usually in Data Warehouses (EDW) we will tend to have POS where as in Datamarts we will have it aggregated by week or month so that we never loose thedetailed information. This detailed level data can be used to get the micro behaviorsof our customers (especially in Data Mining)Data Warehousing Objects: Data ware housing consists only two objects  Fact  DimensionFact Tables:A fact table typically has two types of columns: those that contain numeric facts(often called measurements), and those that are foreign keys to dimension tables. Afact table contains either detail-level facts or facts that have been aggregated. Facttables that contain aggregated facts are often called summary tables. A fact tableusually contains facts with the same level of aggregation. Though most facts areadditive, they can also be semi-additive or non-additive. Additive facts can beaggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all. An example of this is averages. Semi-additive
  • 42. facts can be aggregated along some of the dimensions and not along others. Anexample of this is inventory levels, where you cannot tell what a level means simplyby looking at it.Dimension Tables:A dimension is a structure, often composed of one or more hierarchies, thatcategorizes data. Dimensional attributes help to describe the dimensional value.They are normally descriptive, textual values. Several distinct dimensions, combinedwith facts, enable you to answer business questions. Commonly used dimensionsare customers, products, and time.Dimension data is typically collected at the lowest level of detail and then aggregatedinto higher level totals that are more useful for analysis. These natural rollups oraggregations within a dimension table are called hierarchies.Hierarchies:Hierarchies are logical structures that use ordered levels as a means of organizingdata. A hierarchy can be used to define data aggregation. For example, in a timedimension, a hierarchy might aggregate data from the month level to the quarterlevel to the year level. A hierarchy can also be used to define a navigational drill pathand to establish a family structure.Within a hierarchy, each level is logically connected to the levels above and below it.Data values at lower levels aggregate into the data values at higher levels. Adimension can be composed of more than one hierarchy. For example, in theproduct dimension, there might be two hierarchies--one for product categories andone for product suppliers.Dimension hierarchies also group levels from general to granular. Query tools usehierarchies to enable you to drill down into your data to view different levels ofgranularity. This is one of the key benefits of a data warehouse.When designing hierarchies, you must consider the relationships in businessstructures. For example, a divisional multilevel sales organization.Hierarchies impose a family structure on dimension values. For a particular levelvalue, a value at the next higher level is its parent, and values at the next lower levelare its children. These familial relationships enable analysts to access data quickly.YEAR QUATER WEEK
  • 43. How to handle Slowly Changing Dimensions (SCDs) in data model design?Posted by Dylan Wan on January 13, 2007There are multiple methods to handle the slowly changing dimensions. Whichtechnique to use depends on your business requirements. The choice among thesethree methods are not a technical design decision since their behaviors are different.Type One: Overwite the old data with new dataUsing this method, you do not store the histoy. For example, that say each customercan have one salesrep at any given point in time. When the salerep of ABC Inc.,changes from Sandy to Laura, Sandy was a salerep of ABC will not be keptanywhere. Any report by salesrep will assume that Laura is the salereps of ABC Inc.forever and count all the sales done by Sandy as Lanura’s.The above example may not sound making business sense. However, if you onlyreport the sales of the current period, and salesrep does not change during theperiod, this method is ok to be used.Mary OLTP tables does not need to track the history of changes and thus thismethod may be used by the source application. However, if you want to report thehistorical data, even your OLTP does not track history, the data warehouse can stilluse other methods to track the history.Type Two: Add a new record at the timeof the change Using this method, all priorhistory are saved. There MONTH are two alternative methods to model the key of thistable.Method A – No surrogate key – Use timestampWhen a change happens, a new record is added into the table. All the attributes arecopied from the previous record except the changed values. The nature key iscopied as well so the timestamps is used to differentiate the records.When a fact table is joined with the dimension, if you are interested in the historicaldata, the timestamp will be used as part of the join condition. To ease the join, therecord typically use two date columns – the effective start date and the effective enddate.
  • 44. Method B – No surrogate key – Use version numberInstead of using the date column, a version number is used to differentiate thedifferent versions of the records.This technique requires the fact table store both nature key and the version numberto retrive a given version of the dimension date.Method C – Use a surrogate keyWhen an attribue is change, a sequence generated key is used, the fact table willalso use this key column as the foreign key.Type Three: Track changes using a separate columnUsing this method, you use a separate column of dimension table to store the valuesof previous years, in addition to the current year data.This method does not track all the history, but just one prior version.If the data is changed, the old value need to be moved from the current value columnto the prior column and the new value overwrites the current column.This method is used when the changes is not randon but a predefined interval suchas annual.
  • 45. Structured Query LanguageSQL is a database language used to create, manipulate and control the access tothe Database objects. SQL is a non procedural language used to access relationaldatabases. It is a flexible, efficient language with features designed to manipulateand examine relational data.SQL is only used for definition and manipulation of database objects. It cannot beused for application development like form definitions, creation of proceduresetc...For that you need to necessarily have some 3gl languages such as cobol or 4gllanguages such as Dbase to provide front-end support to the database.Key features of SQL are: • Non procedural language • Unified Language • Common language for all Relational databases. ( Syntax may change between different RDBMS )SQL is made of Three sub-languages such as: • Data Definition language (DDL) • Data Manipulation language (DML) • Data control language (DCL)Data Definition Language (DDL): allows you to define database objects at theconceptual level. It consists of commands to create objects and alter the structure ofobjects, such as tables, views, indexes etc.. Commonly used DDL statements areCREATE, DROP etc..If you want to create a table Student,then use the following syntaxCREATE TABLE STUDENT( STUDENT_ID INTEGER PRIMARY KEY,STUDENT_NM VARCHAR(30),COURSE_ID VARCHAR(15) ,PHONE VARCHAR(10) ,ADDRESS VARCHAR(50) );To drop a table from the databaseDROP TABLE STUDENT;Data Manipulation language(DML): Allows you to retrieve or update data within adatabase. It is used for query, insertion, deletion and updating of information storedin databases. Eg: Select, Insert, Update, Delete.
  • 46. STUDENT_ID STUDENT_NM COURSE_ID PHONE ADDRESS 972-888-90 888, North Central Exp,1001 JAMES Oracle 18 Dallas, TX- 75089 972-678-89 567, Preston Road, Dallas,1002 JIM MSSql Server 09 TX - 75240 214-571-15 1234, Elm Street, Dallas,1003 BRUCE Java 67 TX - 75039Select statement:Select statement in SQL language is used to display certain data from the table.Forexample:- if you want to know what course Jim is taking; Select statement fetchesyou the information you want,when you use the information you have. So,in theabove scenario the information you have is student_nm as Jim and and theinformation you want is course_id, the intersection of those two columns in thattable is what you are looking for.SELECT (what you want)FROM (which tables)WHERE (what you have )Now the select statement to know the course_id Jim looks like this:SELECT COURSE_IDFROM STUDENTWHERE STUDENT_NM = JIMYou will get the result as:COURSE_IDMSSql ServerIf you want to see all the rows in the table then your select will be:SELECT * FROM STUDENT;If you would like to show student_nm and address who is attending Oracle course inthe form of a report then your select will look like:SELECT STUDENT_NM, ADDRESSFROM STUENTWHERE COURSE_ID = OracleThe result will beSTUDENT_NM ADDRESS
  • 47. JAMES 888, North Central Exp, Dallas, TX- 75089Insert StatementInsert statement is used to insert a new row into the table. For example:- If a newstudent DAVE is joining Java course then,use the INSERT SQL statement.INSERT INTO STUDENT (STUDENT_ID, STUDENT_NM, COURSE_ID,PHONE,ADDRESS ) VALUES(1004, DAVE, Java,972-912-4008, 567, Washington Ave, Dallas - 75543 )after executing the insert statement,your table should look like below when you issuea select from student table:STUDENT_ID STUDENT_NM COURSE_ID PHONE ADDRESS 972-888-90 888, North Central Exp,1001 JAMES Oracle 18 Dallas, TX- 75089 972-678-89 567, Preston Road, Dallas,1002 JIM MSSql Server 09 TX - 75240 214-571-15 1234, Elm Street, Dallas, TX1003 BRUCE Java 67 - 75039 972-912-40 567, Washington Ave,1004 DAVE Java 08 Dallas - 75543Update Statementis used to change the existing information in the table.For example:-If DAVE movedto another address then we need to change the ADDRESS column for DAVEsrecord.If the new address is 146, Dallas Parkway, Dallas - 75240 then your updateshould be:UPDATE STUDENT SET ADDRESS = 146, Dallas Parkway, Dallas - 75240WHERE STUDENT_NM = DAVEIn order to make sure you updated the Address column for DAVE issue followingSQLSELECT * FROM STUDENT WHERE STUDENT_NM = DAVEthen you should see the following resultSTUDENT_ID STUDENT_NM COURSE_ID PHONE ADDRESS 972-912-40 146, Dallas Parkway, Dallas1004 DAVE Java 08 - 75240Delete Statement
  • 48. is used to delete a row from the table ie remove records from the table.Forexample:JAMES moved to different city, and he does not want to take the course.Inorder to remove JAMESs record from the table we use the DELETE statementDELETE STUDENTWHERE STUDENT_NM = JAMESonce you delete the record and you select all the information from the student tableyou should see the following information:STUDENT_ID STUDENT_NM COURSE_ID PHONE ADDRESS 972-678-89 567, Preston Road, Dallas,1002 JIM MSSql Server 09 TX - 75240 214-571-15 1234, Elm Street, Dallas,1003 BRUCE Java 67 TX - 75039 972-912-40 567, Washington Ave,1004 DAVE Java 08 Dallas - 75543If you dont include where clause in delete statment then it will remove all the rowsfrom the table.Data control language(DCL)In RDBMS one of the main advantages is the security for the data in the database.You can allow some user to do a specific operation or all operations on certainobjects. Examples for DCL statements are GRANT, REVOKE statements.GRANT is used to Grant a permission to an user so that the user can do thatoperation.REVOKE is used to take back that permission from that user on that object.For example we have two users JAMES and DAVIDIf JAMES created a table called ITEMS then JAMES becomes the owner of thattable.DAVID cannot access ITEMS table because he is not the owner of that table.DAVID can access ITEMS if JAMES gives the permission on his table.JAMES can give different types of access like Select, Update, Delete and Inserton ITEMS table to DAVID.For example:-If JAMES wants to provide only Select on ITEMS to DAVID then he can issue:GRANT SELECT ON ITEMS TO JAMESIf JAMES wants to provide only Select and Insert on ITEMS to DAVID then he canissue: GRANT SELECT, INSERT ON ITEMS TO JAMESIf JAMES wants to provide all the operations on ITEMS to DAVID then he can issue:GRANT ALL ON ITEMS TO JAMESOnce you provide all permissions on an object to an user then indirectly he becomesthe owner and can do any manipulation to the table.
  • 49. Oracle datatypesData in a database is stored in the form of tables. Each table consists of rows andcolumns to store the data.A particular column in a table must contain the same type of data.For example:PLAYER_NAME(char COUNTRY DATE_OF_BIRTH(date ROOM_NO(number)) (char) )AGASSI USA 10/12/1969 1004WILLIAM USA 01/15/1975 1006JIM RUSSIA 05/25/1980 1007 SWITZERLANHINGIS 06/25/1979 1009 DEvery column has certain information, PLAYER_NAME is a char column.DATE_OF_BIRTH is a Date column, ROOM_NO is a number column.Different datatypes available in Oracle database:CHAR: To store character type of data,for example: name of a person (you can saveanything in character field)VARCHAR: Same as CHAR. The only difference between CHAR and VARCHAR isthe way the database saves the data.To understand the difference better we will take the following example.CREATE TABLE EMPLOYEE (EMP_NO NUMBER(4), ENAME CHAR(15))EMP_NO ENAME888 CLARK889 KING890 DAVID COOPERAs Ename column defined as CHAR(15) every value you put it that column willoccupy all 15 bytes ie CLARK is 5 bytes string,so the database pads 10 spaces.CREATE TABLE EMPLOYEE (EMP_NO NUMBER(4), ENAME VARCHAR(15))EMP_NO ENAME888 CLARK889 KING890 DAVID COOPER
  • 50. Here as Ename is defined as VARCHAR(15) it occupies only the required space. soin the above table ename CLARK occupies only 5 bytes in the database.So what are the advantages and disadvantages?.The thumb rule here is that if youare using a char column as primary key then it better be a char field. If you are usinga column to have comments then you must use varchar.NUMBER: Used to store the numbers, for example:If you want to store employeenumbers then you define the columns data type as number. If you want to define acolumn to store currency then you can define the column as NUMBER(7,2).DATE: Used to store the date,like Date of birth of a person, join date in a companyetc.LONG: to store the variable char length.RAW:LONG RAW: store binary data of variable length.LOB: Large objects to store binary files.In addition oracle 8 supports CLOB, BLOB and BFILECLOB - A table can have multiple columns of this type.BLOB - can store large binary objects such as graphics, video and sound files.BFILE - stores file pointers to LOB managed by file systems external to the database ConstraintsWhen you bind a business rule to a column in the table then those rules are calledthe Constraints. Constraints are defined while creating the table. Say for example,you cannot have an employee who does not have a name, then employee namecolumn in employee table should be a NOT NULL column. The NOT NULL is aconstraint.The following table shows the constraint types and short descriptions.Constraint Type Description you must provide the value in that column. you cannot leave thatNOT NULL column blankPRIMARY KEY No duplicate values allowed, for example Empno in Employee table
  • 51. should be uniqueCHECK checks the value and controls the inserting and updating values.DEFAULT Assigns a default value if no value is given.REFERENCES To maintain the referential integrity (Foreign Key)Examples for some of the rules usually implement through the business rules.NOT NULLIf we have a business rule saying that all customers should have a name, we cannothave any customer without a name. So to implement that business rule we cancreate customer table and specify customer name column as NOT NULL (constraint)ExampleCREATE TABLE EMPLOYEE (EMPNO NUMBER(4) PRIMARY KEY, ENAMEVARCHAR(4) NOT NULL);CHECKCheck constraint is used where we define a condition on a column. Check constraintconsists of the keywordcol_name datatype CHECK (col_name in(value1, value2))ExampleIf you have a business rule saying that all employees in the organization should getatleast $500 then we can use CHECK constraint while creating table.CREATE TABLE EMPLOYEE ( EMPNO NUMBER(4) PRIMARY KEY, ENAMEVARCHAR(4) NOT NULL, SALARY NUMBER(7,2) CHECK (SALARY > 500) );DEFAULTWhile inserting a row into a table without giving values for every column, SQL mustinsert a default value to fill in the excluded columns, or the command will be rejected.The most common default value is NULL. This can be used with columns not definedwith a NOT NULL.Default value assigned to a column while creating the table using CREATE TABLEoperation.ExampleCREATE TABLE ITEM (ITEM_ID NUMBER(4) PRIMARY KEY, ITEM_NAMEVARCHAR(15),ITEM_DESC VARCHAR(100), QOH NUMBER(4) DEFAULT 100)Assigning a default value 0 for numeric columns makes the computation.PRIMARY KEYPrimary Key in a table is a unique identifier of a row. For example,if you aremaintaning the customer profiles, you should assign particular number to each one.So customer_number should be defined as a Primary key in Customer table.
  • 52. REFERENCESis a Foreign key. A foreign key column value refers a column in another table tocheck whether the value exists or not.UNIQUEThe values entered into a column are unique ie no duplicate values exists.Thisconstraint ensures business that there is no duplicates allowed. Data Definition LanguageIts a part of SQL langugae which creates a database object. Examples of databaseobjects are tables, procedures, functions, packages etc. When you create a table ordrop a table you are modifying the structure of the database and that is the reasonwhy it is called data definition language. When you issue a create or alter or drop sqlstatements database internally does a commit,and that is why we cannot include theDDL as part of the transaction.Following are a few DDL statements.Create tableCreate table course (course_id not null number(5) primary key,course_name not null varchar2(30),start_date Date);Alter table course modify ( start_date not null date );Alter table course add ( instructor_id null );Drop table courseCreate table course ( course_id not null primary key, course_name varchar(30),start_Date date ) tablespace=course_info storage (initial 1024k next 1024pctincrease=10) Data Manipulation LanguageData Manipulation in RDBMS means maintaining the data in the database. There arethree DML statements:Insert,Update and Delete. INSERT statment is used to inserta new record into a table. The UPDATE statement is used to change the existinginformation of a table. The DELETE statement is used to remove certain informationfrom the table.We will take an example here:If you are running an apartment complex where yourent apartments,the day to day record maintenance would look like this.tenant_id aptno tenant_name home_phone work_phone apt_rent no_of_pets1000 888 SMITH 881-890-9000 767-908-5432 900 11001 889 STEVE 881-909-8971 898-543-9032 890 01002 890 BILL 781-897-9011 567-891-9108 880 2INSERT Statement
  • 53. If a person named JAMES rented an apartment,we need to add his information intothe table. We have to do an INSERT because the information does not exist in thetable as of now.The following information has to be entered into the database:-name= JAMES aptno = 891, home_phone as 676-789-9011, work_phone as777-567-1234, apt_rent = 880 and no_of_pets as 1.So now how we can write the INSERT statement.INSERT into TENANT(tenant_id, aptno, tenant_name, home_phone, work_phone, apt_rent, no_of_pets )VALUES(1003, 891, JAMES,676-789-9011,777-567-1234, 880, 1 ). After executing theinsert statement the table now should have four rows as shown belowtenant_id aptno tenant_name home_phone work_phone apt_rent no_of_pets1000 888 SMITH 881-890-9000 767-908-5432 900 11001 889 STEVE 881-909-8971 898-543-9032 890 01002 890 BILL 781-897-9011 567-891-9108 880 21003 891 JAMES 676-789-9011 777-567-1234 880 1Following shown are the different syntaxes available INSERT SQL syntaxes.Syntax1INSERT into table_name values (col1, col2, col3....) values (value1, value2,value3.....)In the syntax 1 we need to specify the column names of a table and valuesrespectively. In the application development its more recommened to use this syntaxwhile doing inserts into the table, reason being if you added a column in the tablethen it won’t give an error except the value for that column won’t be supplied andprogram will run fine.Syntax2INSERT into table_name values ( value1, value2.....)In the Syntax 2 we won’t specify the column names and pass all the values to thecolumns respectively.Syntax3INSERT itno table_name (col1, col2, col3...)SELECT col1, col2, col3........ FROM tableIn the Syntax 3 we can insert multiple rows using one INSERT into statement whereas in Syntax 1 and Syntax 2 you can insert only one row at a time.UPDATE Statement
  • 54. Now we will go the next DML statement UPDATE. Update is used to change theexisting value in a column of a table. As JAMES work_phone number changed to765-123-9087 from 777-567-1234 then we need to change that information inJAMES record in the table.UPDATE TENANT SET work_phone = 765-123-9087WHERE tenant_name = JAMES.After executing theUPDATE statement the table now should have four rows asshown below.tenant_id aptno tenant_name home_phone work_phone apt_rent no_of_pets1000 888 SMITH 881-890-9000 767-908-5432 900 11001 889 STEVE 881-909-8971 898-543-9032 890 01002 890 BILL 781-897-9011 567-891-9108 880 21003 891 JAMES 676-789-9011 777-567-1234 880 1SyntaxUPDATE (table_name) SET (colname1 = Value1, colname2 = Value2.......)[WHERE clause]If you wont include WHERE clause in your UPDATE statement then it will update allthe rows in the table, so you should be very careful when you are writing UPDATEstatements in work.DELETE StatementSMITH moves out of the apartment complex, so now we do not need to have hisinformation in the table. You can use DELETE Sql statement.DELETE TENANTWHERE tenant_name = SMITHAfter executing the DELETE statement the table now should have three rows asshown below.tenant_id aptno tenant_name home_phone work_phone apt_rent no_of_pets1001 889 STEVE 881-909-8971 898-543-9032 890 01002 890 BILL 781-897-9011 567-891-9108 880 21003 891 JAMES 676-789-9011 777-567-1234 880 1Syntax
  • 55. DELETE FROM (table_name)[WHERE clause]If you won’t include WHERE clause in your DELETE statement then you delete allthe rows in the table, so you should be very careful when you are writing DELETEstatements in work.Some of the examples of INSERT, UPDATE and DELETEstatements.Insert SQL examplesExample 1INSERT into BOOKS ( book_id, book_nm, author, price ) values ( 234, Oracle,Smith, 45 );Example 2INSERT into BOOKS values ( 235, C++,Austin, 50);Example 3INSERT into BOOKS (book_id, book_nm, author, price )SELECT book_no, book_name, author_name, book_price FROM legacy_booksWHERE author_name = BILL;Update SQL ExamplesExample 1UPDATE BOOKS SET book_nm = C++ for ExpertsExample 2UPDATE BOOKS SET book_nm = OracleWHERE book_no = 103Example 3UPDATE BOOKS SET price = price - 5WHERE author in ( SELECT author FROM authors WHERE state = CA)Example 4UPDATE BOOKS SET price = price - 2WHERE exists ( select author FROM auhtors WHERE books.author =author.author )DELETE SQL ExamplesExample 1DELETE BOOKSExample2DELETE BOOKS WHERE book_no = 235
  • 56. Example 3DELETE BOOKS WHERE author in ( SELECT author FROM authors WHERE state= TX)Create a table called PATIENT so that we can do Data manipulation like INSERT,UPDATE and DELETE statements.Patient_id Number(4) Primary KeyPatient_name Varchar(35) Not Null,Primary_doctor Number(4) Foreign Key,Patient_dob Date Not Null,Patient_phone Char(10) NULLUsing INSERT statements insert the following rows into PATIENT table.PATIENT PATIENT_NAME PRIMARY_DOC PATIENT_DOB PATIENT_PHONE_ID1500 SMITH ABDUL 10/10/1964 312-896-96321501 KTMAN JON 02/02/1960 312-666-14781502 WATER ABDUL 03/03/1955 312-885-96321503 MARINO JON 09/02/1975 312-555-74121504 DAWKINS DUPOINT 05/07/1978 312-951-7532Change the patient name SMITH to RODMAN whose dob is 10/10/1964 and primarydoctor is ABDUL.Change the phone number of WATER from 312-666-1478 to 312-567-8988.Delete patient SMITH from the PATIENT table. NULL VALUESAccording to CODDs rule any RDBMS should support NULL value.What is a null value?Its a unknown value or an undefined value. How you will insert a NULL value intotable.For example if you have a table called APT_ENQUIRY with the following structure.ENQ_NAME char(25) not nullPHONE char(10) not nullADDRESS1 varchar(30) not null
  • 57. ADDRESS2 varchar(30)CITY varchar(30) not nullSTATE char(2) not nullZIP char(5) not nullIf you see the address2 column for MARK there is no value, that is NULL value. Howyou will insert a null value when its undefined. If you omit the column name in yourinsert statement while inserting a row then that column will have a NULL value, youcannot omit the not null column from the insert statement.ENQ_NAME PHONE ADDRESS1 ADDRESS2 CITY STATE ZIP 675-098-347 NEWSMITH KING CORNER 9th STREET NY 01123 8 YORK 972-890-765 QUEENMARK DALLAS TX 75240 4 STREETConsiderations while dealing with NULLs.NULL value is different from simply assigning a column the value 0 or a blank.It cannot compared using the relational or logical operators.select * from apt_enquiry where address2 = null - This is wont fetch any rows.select * from apt_enquiry where address2 is null - This is right.select * from apt_enquiry where address2 <> null - This is wrong.select * from apt_enquiry where address2 is not null - This is right Select Statementis the powerful SQL Command we use the most in the database activity. Selectstatement is used to retrieve the data from the tables.Employee Table with data (Following examples and selects based on the followingtable (EMP))empno ename dob mgr deptno job sal comm 1001 Jones 10/10/1967 1013 10 MANAGER 4000 500 1002 Dave 10/10/1950 1001 10 CLERK 3000 1003 Jhonson 08/06/1955 1013 20 MANAGER 4000 50 1004 David 06/10/1960 1003 20 SALESMAN 3500
  • 58. SyntaxSELECT col_name, col_name.................FROM table_nameWHERE conditionSelecting all columnsWe can select all the columns from a table using * operator in SELECT statement.SELECT * FROM EMP;Displays all the rows from the emp table. Usually we can write this sort of select inthe development environment, we should not write this sort of select in theproduction environment.Selecting particular columns.We can select particular columns from a table. Suppose if we want to select empno,ename and sal column values from the EMP table then we can write the SELECT asfollows.SELECT empno, ename, sal FROM Emp;Column AliasesUsually if we select a column from a table then the column heading is same as thecolumn name, if we want to change the column header for display purpose then wehave to use Aliases for the column names. If the alias includes the space in it thenwe should include within the double quotes.SELECT empno "Employee Number",ename "Employee Name",sal SalaryFROM EmpSpecific RowsIf we want to display all employee numbers and names who works in deptno 10 thenhow we should write the select. Here we need to display empno, ename so thecolumns in SELECT clause is empno, ename. In the FROM clause we need tospecify the table name ie EMP. What is the condition? needs to display theemployees works in deptno 10. So we need to write the WHERE clause in theSELECT. Here we are selecting specific rows within the table. So our Selectstatement will be
  • 59. SELECT empno, enameFROM EmpWHERE deptno = 10;Ordering RowsIf we want to display the result set in an order then we include the ORDER BYCLAUSE in the Select statement. Display the employee names, salary informationand sort the employee names alphabetically.SELECT ename, salFROM EmpORDER BY ename;Suppose we want to display the result set by salary in descending order thenSELECT ename, salFROM EmpORDER BY sal DESC;By default the order by is ASC ie asending.Expressions in Select statementIn order to get the sum of salary and commission we need to add two columns ie saland comm. So you can manipulate in the Select statement itself.SELECT ename, sal, comm, sal + comm "Total"FROM emp;If comm column is null then if we add sal to it, it ends up with a null value. So we canuse NVL function.SELECT ename, sal, comm, sal + nvl(comm,0) "Total"FROM emp;If you want to display all the employees who has their employee numbers as evennumber.SELECT empnoFROM empWHERE mod(empno,2) = 0Concatinating Strings
  • 60. Suppose if we want to display the employee name and salary information as int thefollwing formatJONES works in deptno 10then in the above shown format JONES is ename column and 10 is deptno columnfrom emp table. In the JONES works in deptno 10, the highlighted text should getrepeated for all the rows then we should concatenate the ename information with thedeptno value. To concate the two values in SQL you can use || or CONCAT function.SELECT ename || works in deptno || deptnoFROM emporSELECT ename CONCAT works in deptno CONCAT deptnoFROM emp Querying Multiples TablesJoins are used to combine columns from different tables. With joins, the informationfrom any number of tables can be related. In a join, the tables are listed in the FROMclause, separeated by commas.The condition of the query can refer to any column ofany table joined. The connection between tables is established through the WHEREclause. Based on the condition specified in WHERE clause, the required rows areretrived.Following are the different types of joinsEqui Joins, Cartesian Joins, Outer Joins, Self JoinsEqui JoinsWhen two tables are joined together using equality of values in one or morecolumns, they make and equi join. Table prefixes are utilized to prevent ambiguityand the WHERE clause specifies the columns being joined.ExampleList the employee number, employee name, department number and departmentname.See the information we want in this example. We can get Employee number,Employee name, Department information from employee table but department nameexists in department table, so to get all the information in one Select we should jointwo tables and join with a common column between two tables(where clause), heredeptno column is the common column between emp and dept tables.Select empno, ename, emp.deptno, dnameFrom emp, deptWhere emp.deptno = dept.deptnoCartesian Joins
  • 61. If you are selecting information from more than on table and if you did not specify thewhere clause, each row of one table matches every row of the other table ieCartesian Join.If you have a table TAB1 which has 25 rows, TAB2 which has 10 rows then, if youjoin these two tables without where cluase then you get 25 * 10 ( 250 ) rows as theresult set.Cartesian products is useful in finding out all the possible combination of columnsfrom different tables.Outer JoinsIf there are any values in one table that do not have corresponding values in theother, in an equijoin that row will not be selected. Such rows can be forcefullyselected by using the outer join symbol (+). The corresponding columns for that rowwill have NULLs.Where you will use the Outer Join. For example we have employee and departmenttables. In department table deptno is the primary key, in employee table deptnoexists and its a foreign key. By rule you cannot have a deptno in employee table if itdoes not exists in dept table, ie the primary and foreign key concept. So we can havea department record and there is no employee in the related department.In the emp table, no record of the employees belonging to the department 40 ispresent. Therefore, in the example above for equi join, the row of department 40from the dept table will not be displayedDisplay the list of employees working in each department. Display departmentinformation even if no employee exists in that department.Select empno, ename, dept.deptno, dname, locfrom emp, deptwhere emp.deptno( + ) = dept.deptnoThe outer join symbol (+) cannot be used both the sidesSelf JoinTo join a table to itself means that each row of the tables is combined with itself andevery other row of the table. The self join can be viewed as a join of two copies ofsame table. The table is not actually copied, but SQL performs the command asthough it were.Ex:Get the employee name and manager name assigned for that employee . sincemanager is also employee in employees table
  • 62. Syntax:Select a.ename employee_name, b.name manager_namefrom emp a, emp bwhere a.mgr = b.empno Built-in Database Functions Character FunctionsLOWER ( char variable) - Used to show the string in lower caseUPPER (char variable) - Used to show the string in Upper caseLTRIM (char variable) - Remove the spaces " " in left side of the stringRTRIM (char variable) - Remove the spaces " " in right side of the stringSUBSTR(char variable, m) - Gets you the part of a stringLENGTH(char variable) - Gives the length of the stringINSTR(string variable, char) - Gives the position of the char you are searching forLPADRPADINITCAP(char variable) - Every first letter in the passed string becomes upper caseExamples for Character FunctionsSelect lower(EXAMPLE FOR LOWERCASE) from dual;Select upper(example for upper case) from dual;Select ltrim( left trim example) from dual;Select rtrim (right trim example ) from dual;Select Substr(you are correct, 1,7) from dual;Select length(you are correct) from dual;Select instr(you are correct, correct) from dual;Select initcap(you are correct) from dual;Arithmetic FunctionsABS (numeric)CEIL (numeric)FLOOR (numeric)MODPOWERSIGNSQRT
  • 63. TRUNCROUNDExamples for Arithmetic FunctionsSelect ABS(-9) from dual;Select MOD(5,2) from dual; Date FunctionsSysdate - Gives the current date.Add_months ( date variable, number of months to be added to that date )Months_between ( date variable d1, date variable d2)To_date( char variable, date variable)Last_Day( date variable )Next_Day (date variable, day )To_Char( date variable, to what format you want )Examples for Date FunctionsSELECT sysdate FROM dual;SELECT to_date(1997 09 24, yyyy mm dd) FROM dual;SELECT months_between( sysdate, to_date(10-24-1994,MM-DD-YYYY) FROMdual;SELECT add_months( sysdate, 4) FROM dual;SELECT last_day( sysdate ) FROM dual;SELECT next_day( sysdate,monday) FROM dual;SELECT to_char(sysdate,day-month-yyyy) FROM dual; Group FunctionsGroup By is used mostly with functions in which the functions produces value foreach group.AvgSumCountMaxMin Group By Clause
  • 64. The GROUP BY clause can be used in a SELECT statement to collect data acrossmultiple records and group the results by one or more columns.We use GROUP BY clause when we use the aggregate functions by grouping therecords based on a column. All columns in the SELECT list that are not in groupfunctions must be in the GROUP BY clauseExample:To find the sum of salary by department.Emp Table:Empno Ename Salary Deptno7369 AGASSI 20000 107499 WILLIAMS 10000 207321 JIM 25000 107200 HINGIS 9000 307654 MARIA 19000 107622 JULIE 7000 207644 SANIA 10000 20To select sum of salary for each department we write query asSELECT SUM(SAL),DEPTNO FROM EMP GROUP BY DEPTNO;The Output will be:SUM(SAL) DEPTNO64000 1027000 209000 30The Aggregate functions that can be used along with the GROUP BY clause areSUM(), MAX(), MIN(), COUNT(), AVG(), FIRST(), LAST().NOTE: If we list any of the columns which are not encapsulated in the aggregatefunctions in the SELECT statement, we must list those columns in the GROUP BYclause. We call it as the "THUMB RULE".Example:SELECT DEPTNO, COUNT(*) AS "NUMBER OF EMPLOYEES"FROM EMP
  • 65. WHERE SALARY > 15000GROUP BY DEPTNO;Because you have listed one column in your SELECT statement that is notencapsulated in the COUNT function, the DEPTNO field must, therefore, be listed inthe GROUP BY section. HAVING CLAUSEThe HAVING clause is used in combination with the GROUP BY clause. It can beused in a SELECT statement to filter the records that a GROUP BY returns. (i.e. wecan also say HAVING clause is WHERE clause on the GROUP BY clause.)Example:SELECT DEPTNO, SUM(SALARY) AS "TOTAL SALARY"FROM EMP GROUP BY DEPTNOHAVING SUM(SALARY) > 30000;The above example gives the department number and sum of the salary to thatdepartment and filters the result like, the sum of salary should be greater than30000. DECODEIn Oracle/PLSQL, the DECODE function has the functionality of an IF-THEN-ELSEstatement.The syntax for the DECODE function is:DECODE( expression , search , result [, search , result]... [, default] )expression is the value to compare.search is the value that is compared against expression.result is the value returned, if expression is equal to search.default is optional. If no matches are found, the decode will return default. If defaultis omitted, then the decode statement will return null (if no matches are found).Example:1EMP Table:Empno Deptno Gender Ename7499 10 M Raghu7234 20 F Sita
  • 66. 2345 30 M Ramu1234 40 F RaniTo select gender column as M to F and F to M, we write the query asSELECT DECODE(gender,M,F,M) FROM EMP;The Output will be:genderFMFMExample:2The following expression decodes the DEPTNO column in DEPT table. If DEPTNOis 10 then the expression evaluates to ACCOUNTING,if 20 then the expression evaluates to RESEARCH, if 30 then it evaluates toSALES and NONE as default value.DECODE(deptno,10,ACCOUNTING,20,RESEARCH,30,SALES,NONE);The following example uses the decode expression in the SELECT statement.SELECTDECODE(deptno,10,ACCOUNTING,20,RESEARCH,30,SALES,NONE) FROMEMP;The Output will be:DECODEACCOUNTINGRESEARCHSALESNONE CASEThe CASE function specifies conditions and results for a select or update statement.You can use the CASE function to search for data based on specific conditions or toupdate values based on the condition.
  • 67. The CASE expression can do all that DECODE does plus lot of other thingsincluding IF-THEN analysis, use of any comparison operator and checking multipleconditions, all in a SQL query itself. Moreover, using the CASE function, multipleconditions provided in separate SQL queries can be combined into one, thusavoiding multiple statements on the same table.Syntax for the CASE function is: CASE WHEN condition 1 THEN result 1 WHEN condition 2 THEN result 2 ------ WHEN condition n THEN result n ELSE default result END;Example1:The following statement gives the same result as the above used DECODEstatement.SELECT EMPNO, CASE deptnoWHEN 10 THEN ACCOUNTINGWHEN 20 THEN RESEARCHWHEN 30 THEN SALES ELSE NONE END FROM EMP;The Output will be:empno case7499 ACCOUNTING7234 RESEARCH2345 SALES1234 NONE VIEW
  • 68. A view is a virtual table. A view consists of rows and columns just like a table. Thedifference between a view and a table is that views are definitions built on top ofother tables (or views), and do not hold data themselves. If data is changing in theunderlying table, the same change is reflected in the view. A view can be built on topof a single table or multiple tables. It can also be built on top of another view.DEF: Logically represents subsets of data from one or more tablesADVANTAGES:  To restrict data access  To make complex queries easy  To provide data independence  To present different views of the same dataSYNTAXCREATE VIEW viewname [(column name,....)] AS subqueryCREATE VIEW empvu80AS SELECT employee_id, last_name, salary FROM employees WHERE department_id = 80; Retrieval Operations:Using SELECT statement, Contents of the view can be viewed.Eg:1.Select * from view22.Select totqty,title_id from totsales Modification of viewsYou can add a record to the view by inserting a record in the base table. Forexample, you can insert a record into view2 by adding a record to the table Sales.Take another example.Create a table table1 with two fields col1 and col2.col1 allows not null and col2allows null. Create a view view4which will have only col1.Insert a record intoview4.Use select statement to display the contents of table1 and view4.You will find out that table1 will have a new record with a null value of col2.View4 willalso include this new record. If you are inserting a new record into the view columns
  • 69. other than those in the table should allow for null values. If they do not allow for nullvalues, then inserting a record to the view is not possible.If you want to delete a record from the view, you can do so by deleting it from thebase table. Similarly, updation of view is possible only through base tablesSYNTAX Modify the EMPVU80 view by using CREATE OR REPLACE VIEW clause. Add an alias for each column name. CREATE OR REPLACE VIEW empvu80 (id_number, name, sal, department_id) AS SELECT employee_id, first_name || || last_name, salary, department_id FROM employees WHERE department_id = 80;  Column aliases in the CREATE VIEW clause are listed in the same order as the columns in the subquery. REMOVING A VIEW: DROP VIEW view; Sequences A sequence:  Automatically generates unique numbers  Is a sharable object  Is typically used to create a primary key value  Replaces application code  Speeds up the efficiency of accessing sequence values when cached in memory Sequence is an object which generates the sequence numbers, first time when you get a value you get 1, next time you get 2, next you get 3..............SYNTAX: CREATE SEQUENCE sequence [INCREMENT BY n] [START WITH n] [{MAXVALUE n | NOMAXVALUE}]
  • 70. [{MINVALUE n | NOMINVALUE}] [{CYCLE | NOCYCLE}] [{CACHE n | NOCACHE}]; • to drop a Sequence DROP SEQUENCE CUSTOMER_SEQ • ALTER SEQUECE CUSTOMER_SEQ RECYCLE CACHE 100 • <seq_name> CURRVAL :Returns the current value of sequence. • <seq_name> NEXTVAL :Returns the next value of the sequence.Also increments the value What is an Index? An index: Is a schema object Is used by the Oracle server to speed up the retrieval of rows by using a pointer Can reduce disk I/O by using a rapid path access method to locate data quickly Is independent of the table it indexes Is used and maintained automatically by the Oracle server How Are Indexes Created? Automatically: A unique index is created automatically when you define a PRIMARY KEY or UNIQUE constraint in a table definition. Manually: Users can create nonunique indexes on columns to speed up access to the rows. Creating an Index CREATE INDEX index ON table (column[, column]...); When to Create an Index You should create an index if: A column contains a wide range of values
  • 71.  A column contains a large number of null values  One or more columns are frequently used together in a WHERE clause or a join condition  The table is large and most queries are expected to retrieve less than 2 to 4 percent of the rows When Not to Create an Index It is usually not worth creating an index if:  The table is small  The columns are not often used as a condition in the query  Most queries are expected to retrieve more than 2 to 4 percent of the rows in the table  The table is updated frequently  The indexed columns are referenced as part of an expression Removing an Index DROP INDEX index; SynonymsSynonyms is nothing but another name for a table in the current database or in otherdatabase. Its easier CREATE [PUBLIC] SYNONYM synonym FOR object; Creating and Removing Synonyms CREATE SYNONYM d_sum FOR dept_sum_vu; REMOVING A SYNONYMS DROP SYNONYM d_sum; Introduction to PL/SQL
  • 72. PL/SQL is nothing but a Procedural language which includes SQL. PL/SQL includesprogramming concepts such as using variables, IF..THEN ie condition branching,Loops, Error Handling etc. PL/SQL combines the SQL power and processing abilityto give the best in the database industry.  There are two types of blocks in PL/SQL: 1. Anonymous Blocks: have no name (like scripts)  can be written and executed immediately in SQLPLUS  can be used in a trigger 2. Named Blocks:  Procedures  FunctionsAnonymous BlocksThe structure of a block in the PL/SQL programming language is a program throughwhich you can write a PL/SQL. It starts with DECLARE section where you declare allthe variables you need in the block, but its an optional what it means is there is noneed to have declare section if you dont have any variables. Next comes BEGINsection which is mandatory followed by EXCEPTION section where you handle theerrors and the block ends with a END; Usually a Anonymous Block can be executedonly once and usually you save the code in a file.Block Structure DECLARE --where you declare variables like var_customer number(4); -- declaring a variable named var_customer of data type number. var_numofrows number(6) BEGIN -- where you actually perform the operation -- embedded select inside the PL/SQL Block Select count(*) into var_numofrows from invoices where customer_no =var_customer; -- Display the value which you got dbms_output.put_line(The number of invoices we have for the customer ||var_customer || is || var_numofrows);EXCEPTION -- where you handle the error;END;Advantages of PL/SQLModularityReusabilityMaintenance
  • 73. AbstractionPerformanceData IntegrityData security Difference between SQL & PL/SQLSQL is a non procedural and interactive.PL/SQL is a programming language where we can declare variables and write thecode to process some job. Variables and ConstantsVariables and constants are used to hold and manipulate the values with in thePL/SQL. In the declare section of a block you declare the variables and its data type.suppose we want to hold the customer number in a block then how we declare avariable?first we should know the data type of customer number, whether its a number datatype or char data type. If its a number data type thenvar_custno NUMBER(5);to declare a constant valueticket_price CONSTANT number(4) := 150;SQL data types - CHAR, DATE, NUMBER, VARCHARPL/SQL data types - BOOLEAN, BINARY_INTEGER, EXCEPTIONExample for %TYPEWhat is the use of %TYPE declaration in PL/SQL students instead of hardcoding thedatatype?DECLARE var_custno number(3); var_custname varchar(100);BEGIN Select customer_name into var_custname from customer where customer_no =var_custno; dbms_output.put_line(Customer Name is || var_custname);EXCEPTION WHEN no_data_found then dbms_output.put_line(Customer does not exists in Customer table);END;
  • 74. If this code is in the database (which is in production), after some days if theychanged the customer_no column from number(3) to number(4) then your programshould change otherwise you will end up getting invalid number error while trying toexecute.If we could use %TYPE instead of number(3) then we dont need to change theprogram, just changing the datatype in table automatically reflects in the PL/SQLblock as it gets the datatype dynamically from the table when the PL/SQL block getsexecuted.DECLARE var_custno CUSTOMER.CUSTOMER_NO%TYPE; var_custname CUSTOMER.CUSTOMER_NAME%TYPE;BEGIN Select customer_name into var_custname from customer where customer_no =var_custno; dbms_output.put_line(Customer Name is || var_custname);EXCEPTION WHEN no_data_found then dbms_output.put_line(Customer does not exists in Customer table);END;Example for %ROWTYPEIn the %ROWTYPE we can assign the whole selected row from a table to thatvariable. The ROWTYPE variable have as many columns in the table from which theROW is defined.Main use of this ROWTYPE is, we can pass the whole row to a function or procedureinstead of passing all the columns as seperate arguments, so that maintenance willbe easiar.Declaring the ROWTYPE variable.TABLENAME%ROWTYPE;ExampleDECLARE var_custno CUSTOMER.CUSTOMER_NO%TYPE; var_custrec CUSTOMER%ROWTYPE;BEGIN var_custno := &CustomerNumber; SELECT * into var_custrec FROM customer WHERE customer_no =var_custno; dbms_output.put_line( var_custrec.customer_name || , ||var_custrec.cust_addr);EXCEPTION WHEN no_data_found then
  • 75. dbms_output.put_line( No data found for the Customer no you entered);END;Record DatatypeA record in PL/SQL is nothing but a variable which includes more than one datatype.First we have to declare the record type data type and then assign then declare avariable of that type so that we can use it in the block.So there are two steps and the syntax is • TYPE rec_datatype IS RECORD ( var_name1 datatype, var_name2 datatype,....) • var_rec rec_datatypeExample for RECORD typeDECALRE TYPE custinforec IS RECORD ( var_custno customer.cust_no%TYPE,var_custname customer.cust_name%TYPE); var_custrec custinforec;BEGIN SELECT cust_no, cust_name into var_custrec FROM customer WHEREcust_no = 1123; dbms_output.put_line(var_custrec.cust_no);EXCEPTION WHEN no_data_found then dbms_output.put_line(No data found for the query);END;Variable ScopeDECLARE -- Outermost block var_customer number(4); BEGIN -- In this block we can see var_customer variable which we declared in thisblock DECLARE -- inner block var_innername varchar(20); BEGIN -- In this block we can see var_innernmae variable as well as var_customer
  • 76. which is declared in the outermost block. We cannot refer var_inexception variable from this blockbecase its in a different block. We can see the current blocks variables as well as outer block variables EXCEPTION -- Error handling for the inner block END;EXCEPTION WHEN an exception occurs DECLARE var_inexception number(4); begin -- another block BEGIN -- In this block we can refer var_inexception variables as well asvar_customer which is declared in the outermost block. We cannot refer var_innername variable from this blockbecause its in a different block. EXCEPTION -- Error handling for the exception block i.e current block END;END; Control Statements and LoopsUsed to control PL/SQL logic with the conditional structure, with loops and withunconditional branching.PL/SQL Control DescriptionStatement Condition, If the expression is true then executeIF-THEN-ELSE one sequence else another sequence Repeat a statement or set of statements unconditionallyLOOP You break the loop using EXIT statement.
  • 77. Repeat a statement or set of statements for a fixed numberFOR-LOOP of times Repeat a statement of set of statements until condition isWHILE-LOOP FALSE.GOTO Branch to a new set of statements.A Loop is nothing but executing the same block of code more than one time. PL/SQLsupports the Loops, various types of loops in PL/SQL are as shown in this page.IF..THEN Statementis used to check certain condition, if the condition is TRUE then execute the THENset of statements, otherwise execute ELSE set of statements.IF-THEN ELSEDECLARE var_num1 number(4); var_num2 number(4);BEGIN var_num1 := 10; var_num2 := 20; IF var_num1 > var_num2 THEN dbms_output.put_line(The largest number is || to_char(var_num1)); ELSE dbms_output.put_line(The largest number is || to_char(var_num2)); END IF;END;DECLARE var_checkno number(4);BEGIN var_checkno := &tocheck; IF mod(var_checkno,2) = 0 THEN dbms_output.put_line(Even number); ELSE dbms_output.put_line(Odd number); END IF;END;In the following PL/SQL block lets you enter 3 numbers and finds the largest one. Inthis you can see IF..THEN with in another IF..THEN.DECLARE first_num number(3); sec_num number(3); third_num number(3);BEGIN first_num := &number1; sec_num := &number2;
  • 78. third_num := &number3; IF first_num > sec_num THEN IF first_num > third_num THEN dbms_output.put_line(First Number || to_char(first_num) || is greater of allentered numbers); ELSE dbms_output.put_line(Third Number || to_char(third_num) || is greater ofall entered numbers); END IF; ELSE IF sec_num > third_num THEN dbms_output.put_line(Second Number || to_char(sec_num) || is greater ofall entered numbers); ELSE dbms_output.put_line(Third Number || to_char(third_num) || is greater ofall entered numbers); END IF; END IF;END;IF-THEN-ELSIFEnter customer number if the total number of orders < 1000 then OK, between 1000and 2000 then GOOD other wise TOP CUSTOMER.DECLARE var_custno CUTOMER.CUSTNO%TYPE; var_orders number(10);BEGIN var_custno := &CustomerNo; Select count(order_id) into var_orders From orders where customer_no =var_custno; IF var_orders < 1000 THEN dbms_output.put_line(to_char(var_custno) || is a OK customer); ELSIF var_orders between 1000 and 2000 THEN dbms_output.put_line(to_char(var_custno) || is a GOOD customer); ELSE dbms_output.put_line(to_char(var_custno) || is a TOP customer); END IF;END;Unconditional LoopWhats an unconditional loop? which enters into the loop first then check thecondition to get out of the loop, where as conditional loop checks the condition,based on the result it will decide whether to go into the loop or bypass the whole loopand continue to the next statement in the block.DECLARE var_running number(4);
  • 79. BEGIN var_running := 1; Loop var_running := var_running + 1; dbms_output.put_line( The current number is || to_char( var_running ) ); If var_running > 101 then Exit; End If; End LoopEND;While LoopThe syntax for While loop isWHILE condition LOOP --pl/sql statementsEND LOOPFor LoopIf you the know the number of times you are going to execute the code then we canuse For loop in PL/SQL.The syntax for For Loop isFor var in starting_no..ending_no Loop -- write the code to execute so many timesEnd LoopExampleTo Display the even numbers between 1 and 200 using For Loop in a PL/SQL block.DECLARE var_runningvalue number(3);BEGIN dbms_output.put_line(Even numbers between 1 and 200); dbms_output.put_line(-----------------------------------); For var_runningvalue in 1..200 Loop If mod(var_running,2) = 0 then dbms_output.put_line(var_running) End If; End Loop;END;GOTO StatementIn a block we can skip some of the statements and jump to a execute position usingGOTO statement. Declaration of GOTO statement is GOTO Lable_Name.Example
  • 80. DECLARE var_empno employee.employeeno%type; var_empname employee.employeename%type; var_empstate employee.state_code%type; var_salary number(12,2);BEGIN Select stat_code,salary into var_empstate, var_salary From employee Where employeeno = var_empno; IF var_empstate = TX THEN GOTO <<texas>> END IF; Select state_tax into :var_statetax From state where state = var_empstate; var_salary = var_salary - (var_salary * var_statetax/100); <<texas>> var_salary = var_salary + 0; -- just add 0 to the var_salary if its texasEND; CURSORSis the way you loop through the rows returned by a Select statement. Say forexample from SQL* Plus if you write SELECT customer_name FROM customerreturns the set of rows from customer table. If your corporation decides to give somediscounts for your customers, based on how loyal he is, how much business we dowith that customer etc etc, now we need to check some of the stuff before givingdiscounts, so we need to check one by one row from customer table and make adecision based on the rules. So here we cannot use a single update statement to therelated tables. Now Cursor comes into picture. A cursor is nothing but a result setthrough which you can fetch one by one row.They are two different types of cursors • Explicit Cursors • Implicit Cursors.Implicit cursors is nothing but if you issue a select statement the server executes thequeryand stores the rows in a memory area in the server and returns the rows in networkpackets to the server, here you do not have the control in the result set of rows.
  • 81. Explicit cursors is the sql statements where you have the control over the result setwhere you can fetch one by one row from the result set.Things should be done while dealing with cursorsDeclare a cursorOpen a cursorFetch data into variables from the cursorClose the cursorFollowing picture will give you some idea about the cursor.Remember while working with cursorswe must provide same number of variables as the number of columns we selected inthe select statement of the cursor.You cannot fetch once you closed the cursor, if you do it will raise an exceptioncalled invalid_cursor. You can do fetch only after opening the cursor.Cursor Attributes%ISOPEN returns TRUE if already the cursor is open. returns FALSE if its notopened.%NOTFOUND returns TRUE if the last fetch statement does not return a row.%FOUND returns TRUE if the last fetch statement return a row.%ROWCOUNT total number of rows returned so far.How you declare a Cursor?This can be done in the DECLARE section of a PL/SQL Block.
  • 82. DECLARE var_custname customer.cust_name%type; CURSOR getcustnames IS SELECT cust_name FROM customer;BEGIN OPEN getcustnames; --opening a cursor, actually execute the sql --and places all the rows in server memory area LOOP FETCH getcustnames into var_custname; -- Fetching the current record Exit When getcustnames%NOTFOUND --If all the rows got over then -- %NOTFOUND cursor attribute willbe true. dbms_output.put_line(var_custname); -- Display the customer name END LOOP; Close getcustnames; -- Close the cursor so that server releases memory.END;Passing Arguments to a cursorDECLARE TYPE id_emp_table IS TABLE OF number(2) INDEX BY BINARY_INTEGER; v_deptno number(2); i BINARY_INTEGER := 1; CURSOR get_empno(v_in_dept number(2)) IS SELECT empno FROM emp where deptno = v_in_dept; empno_plsql_table id_emp_table;BEGIN v_deptno := 30; Open c1(v_deptno); Loop Fetch c1 into v_empno_hold; If c1%FOUND then empno_plsql_table(i) := v_empno_hold; i := i + 1; Else Exit; End If; End Loop; Close c1; For j in 1..i Loop dbms_output.put_line( empno_plsql_table(j) ); End Loop;END;FOR UPDATE cursor
  • 83. DECLARE v_deptno number(3); CURSOR c1 IS select empno, deptno from emp FOR UPDATE;BEGIN For c1_record IN c1 Loop If deptno = 40 then DELETE from emp WHERE CURRENT OF c1; End If; End Loop; COMMIT WORK;END; PROCEDUREA procedure is nothing but a PL/SQL wrapped up with in a name to save thePL/SQL in the database.What is the difference between a PL/SQL block and a Procedure?When you execute a PL/SQL block the RDBMS check the syntax, parses the queryand creates the execution plan and then executes the PL/SQL block, where as if wecreate a stored procedure in the database while saving the stored procedure itchecks the syntax, parses the queries and saves all the information in the databaseso that when we execute the stored procedure it wont do all that stuff again instead itexecutes the stored procedure using the existing information.SyntaxCREATE OR REPLACE PROCEDURE procedure_name( argument1 in/out data type, argument2 in/out data type....)ASPL/SQL BlockEnd Procedure_nameIN argumentOUT argumentIN OUT argumentIN - pass the value from calling environment into the procedure.OUT - return a value from the procedure to the calling program.IN OUT - pass the value from calling program and the called program passes someother calculated value through the same variable to the calling program.Following diagram explains the difference between IN and IN OUT argumentspassing to a stored procedure or stored function. From where we call storedproceduresWe can call a stored procedure from a Pl/SQL block, another stored procedure,function or a trigger.
  • 84. PRODUCT_I PRODUCT_N QTY_ON_HAN PRICE_PER_QT REORDER_LEVELD M D Y GEM1250 25 $125 10 MONITORS Microsoft Win1251 100 $50 50 98Write a stored procedure when you sell a product, check if the qty_on_hand is equalto reorder level or less than reorder level, if so insert a row into the orders table. Ifyou already placed the order with in last 2 days then do not place an order on thatproduct.CREATE OR REPLACE PROCEDUREcheck_update_reorder ( prod_id in number, curr_qty in number ) isv_reorder_level product.reorder_level%type;beginselect reorder_level into v_reorder_level from products where product_id = prod_id;If curr_qty <= v_reorder_level thenbeginselect 1 from orders where product_id = prod_id and order_date betweentrunc(sysdate) - 2 and trunc(sysdate);exceptionwhen no_data_found theninsert into orders ( order_id, product_id, order_date ) values ( order_seq.nextval,prod_id, sysdate );end;End If;end check_update_level; FUNCTIONA function is nothing but a stored PL/SQL program which perform some operationwhich takes arguments and return a value back to the calling program.Difference between Procedure and FunctionProcedure may not return a value to the calling program. Always function must returna value to the calling program.SyntaxCREATE OR REPLACE FUNCTION function_name(argument1 in/out data type, argument2 in/out data type....)RETURN data typeASPL/SQL block
  • 85. End function_nameWhile writing a function we should have a return statement with in the PL/SQL block.You cannot execute a function as same as stored procedure. You should call afunction from a PL/SQL block or from a sql statement or from another storedprocedure or stored function, the reason being the value is returned by the functionand that value should be in a variable.Write a function to get the customer name from customer table by passing thecustomer number.CREATE OR REPLACE FUNCTION get_custname ( var_custnoCUSTOMER.CUST_NO%TYPE) return char AS var_custnmhold CUSTOMER.CUST_NAME%TYPE; SELECT cust_name into var_custnmhold FROM customer WHERE cust_no = var_custno; Return var_custnmhold;EXCEPTION WHEN no_data_found then Return ;END get_custname; Write a Function to update the customer name by passing thecustomer number and the new name. If you find the row and updated then return1else -1.CREATE OR REPLACE FUNCTION func_upt_custname ( var_custnocustomer.cust_no%TYPE, var_custname customer.cust_name%TYPE ) returnnumber ISBEGIN Update customer set cust_name = var_custname Where cust_no = var_custno; IF SQL%FOUND then Return 1 ELSIF SQL%NOTFOUND then Return -1 END IFEXCEPTIONWHEN Others THEN Return -1END func_upt_custname; PACKAGESA package is a object where you put in all the related procedures and functionstogether in one object. Packages has two parts, one is Package Spec and anotherone is package body.In Packgae Spec is nothing but an object in which you declare the procedure and
  • 86. function names which you are going to group together, arguments for the procedureie declaration part of procedures and functions. In Package Body we write the codefor all the procedures and functions we declare in the package spec.You should have same number of procedure and functions in body as same aspackage spec, otherwise you will get an error when try to save the body.Syntax to create the Package SpecCREATE OR REPLACE PACKAGE SPEC <spec_name >declare variables here so that any procedure or function with in this package can useit.Subprograms declartion ExampleCREATE PROCEDURE invoice_monthly_report ( var_mnthyear char(4) );CREATE FUNCTION check_invoice_balance ( var_invno number(4) ) RETURNnumber;END <spec_name>Syntax to create the Package bodyCREATE OR REPLACE PACKAGE BODY <spec_name > CREATE PROCEDURE invoice_monthly_report ( var_mnthyear char(4) ) AS declare variables BEGIN write the pl/sql code EXCEPTION handle the exceptions END invoice_monthly_report CREATE FUNCTION check_invoice_balance (var_invno number(4))RETURNnumber AS declare the variables BEGIN write pl/sql code return the value to the calling program EXCEPTION handle the exception return the value (may be -1 if the program failed) END check_invoice_balance; END <spec_name> TRIGGERSA trigger is a stored program which will get executed when an event occurs on atable which is nothing but an insert or update or delete statement. You cannot call atrigger like a stored procedure or a function. Triggers cannot pass any arguments totriggers.Following are different types of triggers on a table.
  • 87. • Insert Trigger (Before statement, Before Row, After Row, After Statement)• Update Trigger (Before statement, Before Row, After Row, After Statement)• Delete Trigger (Before statement, Before Row, After Row, After Statement