Scalar unstructured data april 28, 2010

1. Unstructured Data Managing Growth of Unstructured Data Michael TravesChief Architect, Data Managementmichael.traves@scalar.ca

3. Unstructured Data

4. Challenges

5. Approaches

6. Solutions

7. Case Study – Vale Inco

8. Tom Morrier

9. Next Steps

10. Unstructured Data Assessment

11. Activity: Demonstration @ ScalarLabs TGIF Session

12. Questions & Answers

14. Technically led organization specializing in the design, deployment and management of complete IT Infrastructures

17. With our customers and at our own data centres using proven architectures and solutions

18. End-to-end Consulting

19. From up-front assessments to long-term architecture considerations

20. Holistic Vision

21. Scalar designs, deploys and manages the entire IT stack including eco considerationsSystem Implementation Capacity Planning Health Checks Storage and System Consolidation Converged Network Infrastructure

23. Multiple data centre hosting facilities, plus full remote management offerings at customer sites

24. Virtualized offerings include:

25. Cloud computing for primary or dev/test environment

26. Remote VMs / hosted DR at multiple sites

28. The Data Management Challenge

29. The Traditional Infrastructure Problem

31. Storage environments becoming increasing complex and difficult to manage

32. Inconsistent utilization of storage resources

33. Skyrocketing storage and backup costs

34. Lengthy data migrations and consolidations

35. Backup times that exceed backup windows

37. Most companies are still managing growth reactively. Where do you put new data when your filesystems fill up?

38. If you aren’t able to dynamically increase the size of a file system (pooling, thin provisioning, etc), how do you move data between filesystems/servers without impacting users?

39. When you need to increase capacity, how long does it usually take to acquire, deploy and provision it? Do you play the data “shell” game until its ready?

41. In high file count environments, you have a metadata problem, not a data problem.

42. Lots of small files complicate management strategies

43. Archiving, while one strategy to address data growth actually increases file counts (stubs), creating more of a problem

44. Backup and recovery of high file count filesystems are complex – “walking a filesystem” is usually an order of magnitude more time consuming than actually moving the data.

46. File system backups are sequential (one job per filesystem) and take time. Multiple filesystems create management headaches.

47. Full backups of large amounts of data takes time and chew up resources (either D2D, Tape, or Dedupe).

48. Most data doesn’t change week to week (80%+ is aged, static)

49. Large file counts create disk I/O constraints

50. A 72hr backup job can typically be 95% metadata processing and 5% data movement.

52. Storage is typically on a three year life cycle – which generally means four, if you account for migration in and migration out

53. How do you migrate large volumes of data between old and new storage platforms without impacting users?

54. How do you migrate between different types of technologies? I.e., NetApp to EMC, EMC to BlueArc, Windows/UNIX to NAS?

56. Managing multiple solutions is typical with unstructured data – UNIX (NFS) and Windows (CIFS) typically coexist. NAS appliances or gateways come into play when UNIX/Windows can’t scale

57. Having multiple protocols across multiple fileshares, on multiple servers/NAS solutions creates management complexity. Ensuring that each platform can grow/scale to meet demand is difficult to predict, and requires different strategies for managing growth

59. Scale-up strategies leverage the same server/NAS platform by adding capacity. This minimizes management overhead, assuming that filesystems can dynamically be scaled online.

60. This assumes that the existing system can sustain performance growth too

61. Scale-out strategies couple storage capacity with performance, ideally using the same building block for consistency. This is predictable, creates allocation issues

62. Can a single fileshare span multiple device? How is data and performance distributed?

64. How do you manage capacity, when data on different devices grows at different rates?

65. How do you manage performance, when access patterns are unpredictable?

66. Is it possible to redistribute content between filesystems and devices to “optimize” utilization? How does this impact users?

69. Archiving

70. Bigger is Better

71. Tiering

72. Deduplication

74. Pro’s

75. Limits the amount of data people can store in public folders

76. Con’s

77. People always find places to store their data (desktop/laptops, external drives, etc) – usually outside the control and protection of IT

78. Drives helpdesk complaints, and constant “exceptions”

79. Does not address project/departmental folders

81. Pro’s

82. Reduces primary storage usage, and associated costs

83. Reduces backup volumes, reducing backup tape/disk usage

84. Con’s

85. Requires stubs (for no user impact), which does not reduce file counts

86. Increasing file counts while decreasing data does not solve the backup problem – millions of files/stubs still take hours/days to process

88. Con’s

89. More device means more to manage. How do you organize it?

90. When 80%+ of your data is static, how do you separate it from current/new data without impacting users?

91. More primary storage creates more costs, and more backup/recovery pain

93. Pro’s

94. Helps manage cost by prioritizing data placement

95. Con’s

96. How do you decide what should go where?

97. What if priority or access patterns change?

99. Pro’s

100. Deduplication can dramatically reduce the storage footprint for many types of data, promising lower storage costs long-term

101. Con’s

102. Not all data deduplication is created equal. Is it block level, file level, or variable block level? What is the performance impact?

103. More efficient storage of static data is good, but if it’s still in the backup/recovery cycle, have you really addressed the problem?

105. Pro’s

106. Allows you to backup the replication target, instead of source

107. Gets you a DR solution while moving the backup issue offsite

108. Con’s

109. Active and Static data is still mixed, with the same policies and retentions being applied to each

112. Get better utilization out of your storage resources

113. Utilize storage policies to better manage and optimize use of storage devices

114. Easily add and manage storage policies for all devices from a single management console

115. Reduce overall storage costs by 50 to 80%

116. Cut migration times by up to 90% with zero impact to users during migration

118. Balance data and I/O across multiple storage devices, making the most efficient use of your storage resources

119. Data Migration

120. Automatically migrate data between heterogeneous devices, without impacting user access – no downtime

121. Storage Tiering

123. A Global NameSpace (GNS) has the unique ability to aggregate disparate and remote network based file systems, providing a consolidated view that can greatly reduce complexities of localized file management and administration. For example, prior to file system namespace consolidation, two servers exist and each represent their own independent namespaces; e.g. server1hare1 & server2hare2. Various files exist within each share respectively, however users have to access each namespace independently. This becomes an obvious challenge as the number of namespaces grows within an organization.

126. Make the best use of your current and future storage capacity

127. Eliminate the need to manually rebalance data – use automated, policy driven tools instead

130. Transitioning from one generation of technology to another is now a scheduled task, not a 6 month project

131. Keep your vendors competitive – without the pain of data migration projects, your choice of solution comes down to features and costs. Why pay more by being locked in?

134. Keep current data on faster, regularly backed up storage, while segregating static, older content that isn’t changing to lower tiers

135. Eliminate backup of over 80% of your data by cycling it out of the regular backup scheme

137. Storage Tiering – Granular Value-based Policy

139. Utilize your existing storage assets better

140. Optimize access performance and eliminate issues that impact user productivity (scheduled and unscheduled)

141. Pool the resources of servers and NAS appliances you already own, achieving better asset utilization and realized cost savings

142. Eliminated downtime and reconfiguration activities. Enable non-disruptive data management

144. No client reconfiguration – with a virtualized, global name space, the location of data is policy and administrator controlled. Moving data around does not impact access to it.

145. Move entire file systems or individual files around without interrupting access to them.

146. Reduce the overhead of migration projects with a streamlined, consistent, automated solution.

148. Reduce your storage costs by putting data on the right (cost) tier of storage – automated, policy driven.

149. Reduce your backup volumes dramatically be moving aged data out of the daily/weekly/monthly backup cycle. Back static data up once a quarter or less, with proper retention practices.

150. Tiering without Administrative overhead. Automate the challenge of what goes where, and save yourself the trouble.

153. Backup Windows

154. Impact to production during backups

155. Too much data, high growth, archiving partially implemented

156. How we helped

157. Information Life Cycle Management Assessment

158. Reviewing all aspects of data in their environment

159. Current State Analysis

160. Future State Recommendations

161. Technology and Design Recommendations

163. Killing 5 Birds with one Appliance

164. Our Problem(s) Extremely large volumes of data growing out of control Millions of files, many of them under 1k in size Aging End Of Life Data Archiving solution 5 day backup times Backups were running during business hours Small change windows to take outages in 24 hour operation that does not like down time.

165. The Solution Two pair of ARX 4000’s 1 pair in our Primary DC 1 pair in our Largest Site

166. How We Used the ARX Tier 1 Tier 2 5 TB 3 TB 2.5 TB 4 TB

167. The Results Backup Times 98 hours went to 28 hours 5 streams have been turned into 14 streams 4 of witch only happen once a month In primary DC backup times went from 110 hours for 1 full backup to 21 hours over 5 streams for the same full Archiving has been undone in one site and under way in the other Re-archive based on change through tiering All data moves were done during business hours without impacting user data access

168. Some Bonus Results Tape usage has gone down thanks to tiering Data types can be isolated Old systems still accessing network storage surface Strange connections get identified MP3 library gets a boost !

169. Questions ?

171. We’ll look at all aspects of your data environment (online, nearline, offline), processes, and applications, and provide guidance on how to get from “current state” to your desired “future state” given the challenges specific to your business.

172. Unstructured Data Targeted – “Quick Assessment”

174. Through a ½ day workshop, we will gather information about your processes, policies, infrastructure and challenges.

175. A data collection tool will be installed (non-invasive) to capture the metadata for the target filesystem shares (server/NAS)

176. Analysis

177. The captured data will be analyzed to determine what efficiencies would be realized, and the best design case

178. Presentation of Results

179. We will present the results of the analysis, along with recommendations on how to realize the benefits of file virtualization

180. A mapping of benefits to your specific challenges will help build an ROI/TCO for business justification

183. Justification through soft and hard dollar cost savings will be presented to help establish a business case for deployment in your environment

184. CostsFree to Session Attendees Because we believe this solution can be proven out as a cost effective, highly impactful way of managing unstructured data growth, we are presenting this 2 ½ day engagement free of charge.

186. Learn how file virtualization can benefit your environment

187. Free of charge for attendees who complete the survey

188. Inquire for additional information (see handout)

189. Complete your Survey

190. Be sure to complete the survey for your chance to win a Netbook!

191. Beer Tasting

192. Join us for a sampling of Duggin’s Beer

Scalar unstructured data april 28, 2010

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to Scalar unstructured data april 28, 2010

Similar to Scalar unstructured data april 28, 2010 (20)

Recently uploaded

Recently uploaded (20)

Scalar unstructured data april 28, 2010

Editor's Notes