Your SlideShare is downloading. ×
"Oracle Archiving Best Practices"
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

"Oracle Archiving Best Practices"

4,935
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
4,935
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
180
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • The agenda today covers four areas. First we’ll talk about data management functions and outline, at a high level, the different data management job roles and duties. Then we’ll describe the problems of long-term data storage: what is driving the need and the difficulties that will be encountered. Next we’ll describe several “solutions” that have been attempted. And finally we’ll outline the components and capabilities of a database archiving solution.
  • Organization are processing and storing more and more data every year. Average yearly rate of growth: 125% - large volumes of data interfering with operations (the more data in the operational database, the slower all processes may run) And regulations (as well as business practices) dictate that data once stored, be retained for longer periods of time. - No longer months or years, but in some cases multiple decades More varied types of data are being stored in databases. Not just (structured) characters, numbers, dates & times, but also (unstructured) large text, images, video, and more are being stored. - Unstructured data greatly expands storage needs The retained data must be protected from modification – it must represent the authentic business transactions at the time the business was conducted. - need for better protection from modification - need for isolation of content from changes
  • Database Archiving is part of a larger topic, namely Data Archiving. This particular graph is a representation of the data archiving classification put together by one of the analyst groups. Each of these “things” needs to be archived to fulfill regulatory, legal, and business requirements. Each of these areas, though, requires different archival processing requirements due to its form and nature. What works to archive e-mail is not sufficient for archiving database data, and so on. In other words, each box commands its own technology. Now there may be a need for integration across these areas. On one flight I was sitting next to a guy who was an energy company executive. He was responsible for capturing data after explosions. Any time there was an explosion he and his team immediately flew in an gathered as much data as possible before it got contaminated or destroyed. Such information is imperative in determining cause, as well as for possible future legal claims. At any rate, he was interested in a single product or solution that could store and archive all of these items. Such a beast does not yet exist.
  • This slide delineates the various states of data over its useful life. This could be three separate databases or a single database or any combination thereof. Don’t think about data warehousing in this context – here we are talking about the single, official store of data. We start out, of course, with creation of data. After creating the data it moves into its operational state. This is where it serves it primary business purpose. Transactions are enacted upon data in this state. Then we move to the reference state. Data in this form is still needed for reporting and querying purposes. After this state we move into an area where the data is no longer needed for completing business transactions and the chance of it being needed for querying and reporting is small to none. However, the data still needs to be saved for regulatory compliance and other legal purposes, particularly if it pertains to a financial transaction. Finally, after a period of time, the data is no longer needed at all and it can be discarded. This actually should have a much stronger emphasis: the data must be discarded. In many cases the only reason old data is being kept is to enable lawsuits. When there is no legal requirement to maintain such data, companies demand that such data be destroyed – why enable anyone to sue you if it is not a legal requirement?
  • Here we have some examples of regulations that impact data retention. This is just a sampling of the more than 150 different regulations (at the local, state, national, and international levels) that impact data retention. Regulations Drive Retention and Discovery Source: www. domains.cio.com/symantec/wp/ebs_ ediscovery _ev_wp.pdf
  • Archiving is a process used to move data from the operational database to another data store to be kept for the duration of the retention period when it is unacceptable to keep the data in the operational database for that long. So, why might it be unacceptable? - large volumes of data interfering with operations (the more data in the operational database, the slower all processes may run) - need for better protection from modification - need for isolation of content from changes
  • OK, so if we have to archive data, how can we go about doing it? Well, there is always bullet number 1. If you can store it in the operational database then you save a lot of work. But it does not work well for large amounts of data or very long retention periods. And the more data in the database the less efficient transaction processing and reporting will be. So we could take a simple approach of using UNLOAD files. But this is not really useful in today’s environment. It assumes the same data schema will be used if you want to ever RELOAD the data (and that is not likely over long periods of time). When they run into audit situations, where they're asked to produce transaction records spanning multiple years, they have to slog through the work of restoring the data. This is a very labor-intensive, manual process. And costly… some sites have spent hundreds of thousands of dollars to respond to audit requests when they have to resort to such manual data restoration. It might be feasible for 2 to 3-year timespans, but not 20 or more years. In situation number 3, we consider cloning and moving data. But this has a lot of the same problems as UNLOADing the data. Think about what happens as the database schema changes – and how you’d go about accessing the data? Finally, we come to the proper approach for retaining large amounts of data for long periods of time – database archiving.
  • This diagram depicts or outlines the necessary components of a database archiving solution. Starting with the databases and moving down is the extract portion, and up the right side is the data recall portion. And on the bottom left This whole process requires metadata to operate. Things like: The structure of the operational database and the structure of the archive. Metadata about data retention for the archive itself to determine when to discard data from the archive A query mechanism to be able to query the archived data We need to be able to support on-going maintenance of the archive: Security Audit archive access Archive administration (archive file maintenance, policy changes etc). Note: Manual effort required to recall archived data to another database
  • OK, so let’s take a look at the actual functionality needed to support database archiving. This slide outlines the various features and functions needed and we will discuss each of these bullets in separate ensuing slides.
  • Think about this: we are not just archiving pieces of data, but logical “things” that we may need to access later. For example, you will archive a purchase order cherry-picked out of the database – perhaps across multiple tables or structures. This implies that there are policies, perhaps complex ones, that will drive what data is archived and when it is archived. This needs to be driven by the business user knowledgeable about the data’s characteristics.
  • This slide shows the data stored in each table. The archive design process is going to reflect a lot of decision making Which data items need to be archived In some cases, which table actually contains the archive record How should the database relationships be reflected in the archive Are there other relationships (perhaps coded in the application) that need to be addressed in the archive
  • As we discussed already, data retention requirements are stated in decades. This means that our Archive will outlive applications/DBMS/systems that generated them – there may be a DB2 35 years from now but it will look a LOT different than it does today. It also means that our Archive will outlive the people who designed and managed the operational systems. And the Archive will outlive the media we store it on so we will need to manage the data over time and re-platform (no media is guaranteed to exist over long periods of time). Media rot destroys tape and most are not guaranteed more than 7 years at most. For example, there is only 6 minutes of tape left of the first Super Bowl. How do we resolve these issues? Using a unique data store because a standard DBMS is not the right answer. The archive must be immune from DBMS changes over time. Application/DBMS/system independence because you don’t want your archive to prevent you from improving your operational environment. Metadata independence because we don’t want to rely on reading the DB2 Catalog or some copybook for metadata. Continuous management of storage because of media rot. Continuous management of archive content – administration of the archive to ensure its viability.
  • Because the operational system WILL change over such long retention periods, the archive needs to be independent from the operational system. We constantly change our databases. Applications, computing platforms, and so on, to support new technology, M+A, or even management whim. The Archive needs to be able to support multiple variations of the application as it changes. This includes keeping up to date with the metadata changes within each variation of the system for which we are archiving data. The HOW here is interesting. The Archive will never change metadata in any partition. The Archive data is static; unchanging. This means we flip the normal operating mode, which is to modify the existing data to the new format when metadata changes. Instead, when we query against the Archive data can come back in multiple formats – basically, whatever the format of the data was when it was archived (because it can NOT change once archived).
  • The Archive needs its own metadata – it cannot depend on metadata in application programs. For the Archive to be effective we must be able to interpret the archived data from nothing but what is in the archive. And remember, some of the metadata can change over time so it must be attached to the data in the partitions of the archive as of that state in time.
  • The data must be authentic. If it can change after it is archived then it is not of any value or use. This is a requirement for lawsuits and regulatory compliance. Furthermore, if the data is available to the business it can be used for business analysis, too. Do you really think it will NOT be used once it is there? There are many HOWs that are important here: the query language must not support modification, the data can be encrypted, checksum can be used to verify data accuracy, and backup copies can be used for verification, too. Remember, the archive data NEVER changes, so one backup (maybe two to be safe) is all that is needed.
  • We’ve hinted at this in previous slides, but we make the requirement explicit here: we must be able to query data directly from the Archive itself. This means a query language with an SQL-like interface (for compatibility purposes). The query is the portion of the Archive that is actually most important. If you cannot access the data that is retained, then it is as if it is not retained at all. The answers to the queries can be fuzzy though. A query may span multiple partitions representing many years and states of the data. So the query processing must keep this in mind as it delivers an answer. The answer may fall into several categories: guaranteed accurate, fuzzy because of changes over time, or perhaps even no answer at all that is useful because of metadata changes. So we’ll need experts able to compose and re-compose queries against an Archive taking such constraints into account.
  • The ability to discard the data exactly when it must be gotten rid of is important. If the legal requirement is to retain the data for 35 years, then 35 years + 1 day is not acceptable! The data is there, in many cases, to support lawsuits against your company. If there is no legal mandate to keep it then why go to the expense of keeping it to help others sue you? Of course, you must be able to deliver on discovery in a lawsuit. This means querying the Archive to deliver the requested data. This is a potentially HUGE issue. If you lose once in court because you cannot produce data for discovery, then lawyers will descend upon your company filing suit against that same data they know you don’t have. And you will lose. Now keep in mind that Discard does not just mean delete. It means REALLY and TRULY delete forever. Zero out the data by writing over top of it. It must not be able to be recovered by forensic experts or software.
  • Use the customer example here. On how dtermining the archive strategy impacted business decisions. If customer account data was not online forever, but was still accessible in some way.
  • You can transport individual partitions in version 11G.
  • You can partition for performance or manageability but usually you get the benefits of both. Speaking of Compression; http://www.oracle.com/technology/products/bi/db/10g/pdf/twp_data_compression_10gr2_0505.pdf create or replace function compression_ratio (tabname varchar2) return number is pragma autonomous_transaction; -- sample percentage pct number := 0.000099; -- original block count (should be less than 10k) blkcnt number := 0; -- compressed block count blkcntc number; begin execute immediate ' create table TEMP_UNCOMPRESSED pctfree 0 as select * from ' || tabname || ' where rownum < 1'; while ((pct < 100) and (blkcnt < 1000)) loop execute immediate 'truncate table TEMP_UNCOMPRESSED'; execute immediate 'insert into TEMP_UNCOMPRESSED select * from ' || tabname || ' sample block (' || pct || ',10)'; execute immediate 'select count(distinct(dbms_rowid.rowid_block_number(rowid))) from TEMP_UNCOMPRESSED' into blkcnt; pct := pct * 10; end loop; execute immediate 'create table TEMP_COMPRESSED compress as select * from TEMP_UNCOMPRESSED'; execute immediate 'select count(distinct(dbms_rowid.rowid_block_number(rowid))) from TEMP_COMPRESSED' into blkcntc; execute immediate 'drop table TEMP_COMPRESSED'; execute immediate 'drop table TEMP_UNCOMPRESSED'; return (blkcnt/blkcntc); end; /
  • Interval partitioning can be set up to create a new partition when new data is inserted or at the first of the month or whatever. For example, Oracle could create a new partition for every month in the calendar year. A partition is automatically created for ‘October 2007’ as soon as the first record for this month is inserted. Without REF Partitioning y0ou have to duplicate all partitioning key columns from the parent to the child tabnle in order to take advantage of the same partitioning strategy. REF partitioing allows you to naturally partition tables according to the logical data model without requiring to store the partitioning key columns. The partition advisor is part of the SQL access advisor will sho0w the anticipated performance gains.
  • (Note: XML satisfied all of this except for volume.)
  • Unindexed Fks The problem with unidexed foreign key columns is that on a delete cascade a full scan will be performed on the child table. And this full scan will occur once for each row in the parent
  • (Note: XML satisfied all of this except for volume.)
  • Concerning RI, the truncate statement will fail.
  • You would not have this issue if SQL DELETE was not the method of row removal.
  • (Note: XML satisfied all of this except for volume.)
  • (Note: XML satisfied all of this except for volume.)
  • (Note: XML satisfied all of this except for volume.)
  • (Note: XML satisfied all of this except for volume.)
  • The high water mark starts at the first block of a newly created table. Ads data is inserted, the high water mark rises. And the HWM will remain at that level in spite of delete operations. The HWM matters since Oracle will scan all blocks below the HWM even when they contain no data during a full scan – just to see if they have data. TRUNCATE will reset the HWM, so will other operations described next.
  • Note: what does shrink space cascade do? It shrinks dependent indexes as well https://metalink.oracle.com/metalink/plsql/f?p=130:14:804894980766538307::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:NOT,242090.1,1,1,1,helvetica
  • .
  • On a move the existing data stays until the entrie table is copied. So, temporarily you will need twice the space.
  • On a move the existing data stays until the entrie table is copied. So, temporarily you will need twice the space. tbale being deleted. This script will show you unindexed foreign key columns. http://asktom.oracle.com/tkyte/unindex/unindex.sql From ask tom: “ If you have indexes on the fkeys, this question is MOOT for you -- the locking at the table level happens ONLY on unindexed foreign keys. Since the server cannot know beforehand if there are any child records or not -- the lack of an index on the fkey will cause the child table to be locked upon delete of a parent record regardless of whether a row exists or not in the child. This really isn't to do with "on delete cascade" -- it is to do with a) having a foreign key b) not having an index on that fkey c) updating the parents primary key or deleting from parent. ”
  • On a move the existing data stays until the entrie table is copied. So, temporarily you will need twice the space. tbale being deleted. This script will show you unindexed foreign key columns. http://asktom.oracle.com/tkyte/unindex/unindex.sql From ask tom: “ If you have indexes on the fkeys, this question is MOOT for you -- the locking at the table level happens ONLY on unindexed foreign keys. Since the server cannot know beforehand if there are any child records or not -- the lack of an index on the fkey will cause the child table to be locked upon delete of a parent record regardless of whether a row exists or not in the child. This really isn't to do with "on delete cascade" -- it is to do with a) having a foreign key b) not having an index on that fkey c) updating the parents primary key or deleting from parent. ”
  • On a move the existing data stays until the entrie table is copied. So, temporarily you will need twice the space. tbale being deleted. This script will show you unindexed foreign key columns. http://asktom.oracle.com/tkyte/unindex/unindex.sql From ask tom: “ If you have indexes on the fkeys, this question is MOOT for you -- the locking at the table level happens ONLY on unindexed foreign keys. Since the server cannot know beforehand if there are any child records or not -- the lack of an index on the fkey will cause the child table to be locked upon delete of a parent record regardless of whether a row exists or not in the child. This really isn't to do with "on delete cascade" -- it is to do with a) having a foreign key b) not having an index on that fkey c) updating the parents primary key or deleting from parent. ”
  • Transcript

    • 1. Oracle Data Archiving Taming the Beast Dave Moore Neon Enterprise Software
    • 2. Archiving Defined Requirements and Solutions Oracle Archiving Strategies Oracle Row Removal Options Oracle Post Archive Operations Agenda
    • 3. Dave
        • Oracle ACE
        • Using Oracle since 1991
        • Product Author at Neon Enterprise Software
        • Creator of OracleUtilities.com
        • Author of “Oracle Utilities” from Rampant Tech Press
        • Core competencies include performance, utilities and data management
    • 4. Database Archiving Database Archiving : The process of removing selected data records from operational databases that are not expected to be referenced again and storing them in an archive data store where they can be retrieved if needed. Purge
    • 5. Trends Impacting Archive Needs Data Retention Issues: Volume of data Length of retention requirement Varied types of data Security issues Amount of Data Time Required Compliance Protection 0 30+ Yrs
    • 6. Archiving All Types of Data Paper Blueprints Forms Claims Word Excel PDF XML IMS DB2 ORACLE SYBASE SQL Server IDMS VSAM Programs UNIX Files Outlook Lotus Notes Attachments Sound Pictures Video
    • 7. Data Archiving and ILM Mandatory Retention Period Create Discard Operational Reference Archive Needed for completing business transactions Needed for reporting or expected queries Needed for compliance and business protection
    • 8. Some Sample Regulations Impacting Data Retention
    • 9. What Does It All Mean?
      • Enterprises must recognize that there is a business value in organizing their information and data.
      • Organizations that fail to respond run the risk of seeing more of their cases decided on questions of process rather than merit.
      • (Gartner, 20-April-2007, Research Note G00148170: Cost of E-Discovery Threatens to Skew Justice System)
    • 10. Operational Efficiency
      • Database Archiving can be undertaken to improve operational efficiency
        • Large volumes of data can interfere with production operations
            • efficiency of transactions
            • efficiency of utilities: BACKUP/RESTORE, REORG, etc.
            • Storage
              • Gartner: databases copied an average of 6 times!
    • 11. What Solutions Are Out There?
        • Keep Data in Operational Database
          • Problems with authenticity of large amounts of data over long retention times
        • Store Data in UNLOAD files (or backups)
          • Problems with schema change and reading archived data; using backups poses even more serious problems
        • Move Data to a Parallel Reference Database
          • Combines problems of the previous two
        • Move Data to a Database Archive
    • 12. Components of a Database Archiving Solution Data Recall Archive Data Store and Retrieve Archive Data Query Access Archive Administration Archive Store Data & Metadata Production Database Metadata Policies History Recall Database Captured Structure Archive Policies Data Retention Data Extract
    • 13. Archiving Requirements
        • Policy based archiving: logical selection
        • Keep data for very long periods of time
        • Store very large amounts of data in archive
        • Maintain Archives for ever changing operational systems
        • Become independent from Applications/DBMS/Systems
        • Protect authenticity of data
        • Access data when needed; as needed
        • Discard data after retention period automatically
    • 14. Policy based archiving
        • Why :
          • Business objects are archived, not files
          • Rules for when something is ready can be complex
          • Data ready to be archived is distributed over database
        • Implications:
          • User must provide policies for when something is to be archived
        • How:
          • Full metadata description of data
          • Flexible specification of policy : “WHERE clause”
    • 15. For Example… Part Number Type Description Unit Type Cost Price Substitute Parts Parts Master is the parent table to all other tables PARTS MASTER Part Number PO Number Vendor ID Quantity Ordered Unit Cost Date Ordered Date Received ORDER INFO Part Number Dept. ID CHIT ID Qty Disbursed Date Disbursed DISBURSEMENT Part Number Bin Number Qty on Hand Qty on Order Qty Backorder STORAGE INFO Part Number Year Q1 Disbursed Q2 Disbursed Q3 Disbursed Q4 Disbursed SUMMARY BY QUARTER
    • 16. Keep Data for a Long Time
        • Why: retention requirements in decades
        • Implications:
          • Archive will outlive applications/DBMS/systems that generated them
          • Archive will outlive people who designed and managed operational systems
          • Archive will outlive media we store it on
        • How:
          • Unique data store
          • Application/DBMS/system independence
          • Metadata independence
          • Continuous management of storage
          • Continuous management of archive content
    • 17. Maintain Archive for Changing Operational Systems
        • Why :
          • Metadata changes frequently
          • Applications are re-engineered periodically
            • Change DBMS platform
            • Change System platform
            • Replace with new application
            • Consolidate after merger or acquisition
        • Implications:
          • Archive must support multiple variations of an application
          • Archive must deal with metadata changes
        • How:
          • Manage applications as major archive streams having multiple minor streams with metadata differences
          • Achieve independence from operating environment
    • 18. Achieve Metadata Independence
        • Why :
          • Operational metadata is inadequate
          • Operational metadata changes
          • Operational systems keep only the “current” metadata
          • Data in archive often does not mirror data in operational structures
        • Implications:
          • Archive must encapsulate metadata
          • Metadata must be improved
        • How:
          • Metadata Capture, Validate, Enhance capabilities
          • Store structure that encapsulates with data
          • Keeps multiple versions of metadata
    • 19. Protect Authenticity of Data
        • Why :
          • Potential use in lawsuits/ investigations
          • Potential use in business analysis
        • Implications:
          • Protect from unwanted changes
          • Show original input
          • Cannot be managed in operational environment
        • How:
          • SQL Access that does not support I/U/D
          • Do not modify archive data on metadata changes
          • Encryption as stored
          • Checksum for detection of sabotage
          • Limit access to functions
          • Audit use of functions
          • Maintain offsite backup copies for restore if sabotaged
    • 20. Access Data Directly From Archive
        • Why :
          • Cannot depend on application environment
        • Implications:
          • Full access capability within archive system
        • How:
          • Industry standard interface (e.g. JDBC)
          • LOAD format output for
            • For load into a database
            • May be different from source database
          • Requires full and accurate metadata
          • Ability to review metadata
          • Ability to function across metadata changes
    • 21. Discard Function
        • Why :
          • Legal exposure for data kept too long
        • Implications:
          • Data cannot be kept in archive beyond retention period
          • Must be removed with no exposure to forensic software
        • How:
          • Policy based discard
          • System level function
          • Tightly controlled and audited
          • True “zero out” capability
          • Discard from backups as well
    • 22. Database or Archive? Performance Space Compliance Keep in DB Keep in Archive
    • 23. Based on Data Availability Must be Available to App Must be Available Must Be Secure Keep in DB Keep in Archive Not Needed Purge
    • 24. Oracle Archiving Strategies
        • Designed Up Front (Yeah, right)
        • Determined by Application Owner
        • Implemented by ____________
        • Utilize Oracle Features
    • 25. Finding Large Tables
        • DBA_SEGMENTS (bytes)
        • DBA_TABLES (num_rows)
        • or based on I/O
    • 26. Rolling Windows
        • Self Managing
        • Mostly based on DATE
        • Utilize DBMS Features
          • Partitioning
          • Transportable Tablespaces
            • Exchange Partition
            • Set tablespace read only
            • Expdp
            • Copy export file and data file
    • 27. Rolling Windows via Partitioning Probably Never Accessed Rarely Accessed Heavily Accessed Cheap as you can get Not so fast or expensive Fast, expensive Data Profile Storage Profile Read / Write Read Only Read Only / Compressed P1 … … P47
    • 28. Why not use transportable tablespaces or Oracle exports for data retention?
    • 29. The Problem with Oracle Files
        • Transportable Tablespaces
        • Exports
        • Backups
      Export Files & Datafiles Version 16Z Oracle Year 2030 Import Trans Tsp Year 2007 Not a good method for LT Data Retention
    • 30. Partitioning (Old ways)
        • Range Partitioning
        • Data is distributed based on partition key range of values – usually a date.
        • Good When: Data is date-based.
    • 31. Partitioning (Old Ways)
        • Hash Partitioning
        • Uses hash algorithm to create equally sized buckets of data.
        • Good When: No natural partition key and desire I/O balancing (hot spots).
    • 32. Partitioning (Old Ways)
        • List Partitioning
        • Data is distributed based on LIST of values in partition key.
        • Good When: Have short list of values (States, Regions, Account Types)
    • 33. Partitioning (New Ways – 11G)
        • Interval Partitioning
        • Initial Partition is created manually, the rest are automatically created as new data arrives.
        • Good When: Need a rolling window!
    • 34. Partitioning (New Ways – 11G)
        • REF partitioning
        • Related Tables benefit from same partitioning strategy, whether column exists in children or not!
        • Good When: Desire related data to be partitioned in the same manner.
    • 35. Partitioning (New Ways – 11G)
        • Virtual Column Partitioning
        • Partition key may be based on virtual column
        • Good When: Virtual column is required for partition key.
    • 36. Rows Gotta Go
    • 37. Row Removal Options
        • SQL DELETE
        • CTAS / DROP / RENAME
        • TRUNCATE
        • Row Marking
    • 38. SQL DELETE
        • Good for small number of rows
        • RI handled automatically
        • Oracle was born to DELETE, better than any PL/SQL that you write.
        • Issue with Un-indexed Foreign Keys 
    • 39. DELETE Optimization
        • Work in batches, committing (only when programmatically DELETING)
        • Use parallel DML (Partitioned tables only)
        • Drop Indexes before (if possible)
        • Index FK columns
    • 40. CTAS
        • Works well for PURGE, not archive
        • Perfect when you want to keep low percentage of rows in the table
        • Doesn’t handle RI – no DELETE was issued.
        • Process
          • Create table with rows you want to keep
          • Drop old table
          • Rename table
          • Recreate indexes
        • create table new_table unrecoverable as select * from old_table where ...
    • 41. TRUNCATE
        • Congratulations if your application lends itself to TRUNCATE without losing new data
        • What about RI?
        • May truncate or drop individual partitions
    • 42. DROP
        • DROP PARTITION
        • What would you do before you drop it?
        • Exchange partition with table
        • Transportable tablespace.
    • 43. Things to Remember
      • Benchmark the best way for you
      • Benchmark against real data if possible
      • Use parallel DML
    • 44. Design Summary
        • Create an architecture that lends itself to aging, archiving, deleting
        • This architecture should compensate for business requirements
          • For instance, customer orders not accessible after 6 months … or
          • top query performance needed for all ‘ACTIVE’ accounts … etc
        • Implement it – THE EASY PART
    • 45. Post Archive Challenges
    • 46. Post Archive Challenges “ I have successfully deleted 10 billion rows from the table. HoooAhhhh! Performance will be great, space will be available, and I will get credit for optimizing our data warehouse application, saving the company billions of dollars”
    • 47. … 2 Days Later …
    • 48. Post Archive Challenges Hmmmmm. It looks like … - Queries are not any faster . . . - The Select count(*) took the same amount of time . . . - Space was not freed in Oracle (DBA_FREE_SPACE) . . . - Space was not freed in the operating system . . . WHY NOT ????? Where are the benefits ???
    • 49. From Swiss to Provolone After DELETE After Maintenance
    • 50. Post Archive Challenges
        • Statistics are not fresh
        • High Water Marks are very high
        • Space has not been freed within Oracle (if that’s what you want)
        • Space has not been freed to the OS
    • 51. Refresh Statistics
        • Help the optimizer, easy enough
        • dbms_stats provides many options
    • 52. Automatic Stats
        • Recommended by Oracle
        • Calls DBMS_STATS_JOB_PROC
        • Enabled via:
        • Begin
        • dbms_auto_task_admin.enable(
        • client_name => ‘auto optimizer stats collection’,
        • operation => NULL,
        • window_name => NULL);
        • END;
        • /
    • 53. When do you go manual ?
        • High transaction DELETEs or TRUNCATEs
        • Bulk loads which add more than 10% of table size
      • So there’s our answer – go manual.
    • 54. How do we Gather Them?
        • NOT the Analyze Command
        • Instead DBMS_STATS package
        • exec dbms_stats.gather_table_stats(ownname => 'BDB', tabname => 'MASTER', estimate_percent => dbms_stats.auto_sample_size);
    • 55. High Water Mark
    • 56. High Water Mark
    • 57. Reset High Water Mark (HWM)
        • DROP or TRUNCATE
        • Multiple OTHER ways to do this depending on version
        • In v9 … alter table move tablespace [tsp name];
          • Row movement must be enabled
          • Tablespace must be a LMT
          • Can move into same tablespace
          • Will occupy 2X space temporarily
          • Must then rebuild indexes
        • In v10 … alter table <table_name> shrink space;
    • 58.
      • Freeing Allocated Space
    • 59. Create table, check space SQL> create table space_example as select * from dba_source; Table created. SQL> select count(*) from space_example; COUNT(*) ---------- 296463 SQL> exec dbms_space.unused_space(‘DAVE', 'SPACE_EXAMPLE'); Total blocks: 6328 Unused blocks: 1 Unused bytes: 8192 Last Used Block: 55 Last Used Block ID: 10377 Last Used Ext File ID: 4
    • 60. Check datafile space Size Current Poss. FILE_NAME Poss. Size Savings -------------------------------------------------- -------- -------- -------- /export/home/ora102/oradata/ora102/qasb001.dbf 29 46 17 /export/home/ora102/oradata/ora102/example01.dbf 69 100 31 /export/home/ora102/oradata/ora102/qasb002.dbf 41 41 0 /export/home/ora102/oradata/ora102/system01.dbf 493 500 7 /export/home/ora102/oradata/ora102/sysaux01.dbf 430 430 0 /export/home/ora102/oradata/ora102/undotbs01.dbf 91 175 84 /export/home/ora102/oradata/ora102/users01.dbf 44 83 39 /export/home/ora102/oradata/ora102/test.dbf 51 70 19
    • 61. Delete rows, check space SQL> delete from space_example; 296463 rows deleted. SQL> commit; SQL> exec dbms_space.unused_space(‘DAVE', 'SPACE_EXAMPLE'); Total blocks: 6328 Unused blocks: 1 Unused bytes: 8192 Last Used Block: 55 Last Used Block ID: 10377 Last Used Ext File ID: 4 Nothing Changed !
    • 62. Shrink it, check space SQL> alter table space_example enable row movement; SQL> alter table space_example shrink space; SQL> exec dbms_space.unused_space('BDB', 'SPACE_EXAMPLE'); Total blocks: 8 Unused blocks: 4 Unused bytes: 32768 Last Used Block: 4 Last Used Block ID: 5129 Last Used Ext File ID: 4 Space Freed From Table, but still in Oracle
    • 63. Check space again Size Current Poss. FILE_NAME Poss. Size Savings -------------------------------------------------- -------- -------- -------- /export/home/ora102/oradata/ora102/qasb001.dbf 29 46 17 /export/home/ora102/oradata/ora102/example01.dbf 69 100 31 /export/home/ora102/oradata/ora102/qasb002.dbf 41 41 0 /export/home/ora102/oradata/ora102/system01.dbf 493 500 7 /export/home/ora102/oradata/ora102/sysaux01.dbf 430 430 0 /export/home/ora102/oradata/ora102/undotbs01.dbf 171 175 4 /export/home/ora102/oradata/ora102/users01.dbf 44 83 39 /export/home/ora102/oradata/ora102/test.dbf 1 70 69 This datafile should be resized to save 69 MB SQL> alter database datafile '/export/home/ora102/oradata/ora102/test.dbf' resize 1m;
    • 64. Free the Space
        • Space is still reserved for future inserts and updates, just not freed back to the OS
        • Space will not be automatically freed – confirm by checking DBA_FREE_SPACE
        • Ways to set it free
          • drop
          • truncate
          • alter table move …
          • alter table shrink space …
    • 65. Unindexed Foreign Keys Example COL1 1 Million Rows PARENT COL1 COL1_PARENT 1 Million Rows SQL> DELETE FROM PARENT WHERE COL1 < 1000; ON DELETE CASCADE Fky.sql CHILD
    • 66. Before Index delete from parent where col1 < 1000 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.08 2 27 0 0 Execute 1 0.90 0.80 4 2208799 6062 999 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 2 0.91 0.88 6 2208826 6062 999 delete from &quot;DAVE&quot;.&quot;CHILD&quot; where &quot;COL1_PARENT&quot; = :1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 999 285.94 293.11 1543900 2208789 1029 999 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 1000 285.94 293.11 1543900 2208789 1029 999
    • 67. delete from parent where col1 < 1000 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.53 0.47 7 13 7053 999 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 2 0.53 0.47 7 13 7053 999 delete from &quot;DAVE&quot;.&quot;CHILD&quot; where &quot;COL1_PARENT&quot; = :1 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 999 0.42 0.46 2 3002 4058 999 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 1000 0.42 0.46 2 3002 4058 999 SQL> create index prnt_ndx on child(col1_parent);
    • 68. Unindexed Foreign Keys
        • Problem is not limited to DELETE statements
        • Search database for unindexed FK columns
        • Script is on asktom
          • Search for unindex.sql
    • 69. Summary Points
        • Create sound Archiving strategy based on Oracle technical features as well as business and/or legal requirements
        • Leverage partitioning
        • Move partitions to cheap disk when appropriate
        • Make partitions ‘read only’ and compressed
        • Remove data via DROP or TRUNCATE if possible
        • If SQL DELETE, make sure to perform maintenance operations
        • Consider 3 rd party solutions
    • 70. Questions? “ Well done is better than well said” Ben Franklin