Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Solving Challenges With 'Huge Data'

70 views

Published on

Solutions & Client cases

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Solving Challenges With 'Huge Data'

  1. 1. Solving Challenges with 'Huge Data' Solutions & client cases Dr. Axel Koester - axel.koester@de.ibm.com Chief Technologist EMEA Storage Competence Center Chairman of TEC think tank D/A/CH
  2. 2. 2 Three ways how IT uses data … today Procedural (if…then) Image: Business over Broadway Statistical (big data) Machine learning Image: opendatascience.com
  3. 3. 3 … and in 10 years Procedural (if…then) Image: Business over Broadway Statistical (big data) Machine learning Image: opendatascience.com
  4. 4. 4 Current examples Image: Business over Broadway Image: opendatascience.com shopping, profiling, fraud detection … autonomous driving, image classification, chatbots, gaming… Manual modelling Accumulation of examples Automatic modelling business as usual classic / legacy IT
  5. 5. 5 OK defect defect defect defect defect defect defect defect Example of trained (rather than programmed) quality inspection
  6. 6. 6 Train-on-the-job by reviewing low-confidence cases MUCH CHEAPER THAN RE-CODING AT EVERY PROD CHANGE
  7. 7. 7 Procedural: Archive test cases for auditing Statistical: Parallel processing of many stored samples Machine Learning: Train sample data, then archive or trade data Image: Business over Broadway Image: opendatascience.com How is data stored? if…then…else GB/s 1 2 GB/s 3 parallel search
  8. 8. 10 Imperatives for data storage: implement workflows avoid "data tourism" scale without effort
  9. 9. 11 DESY: Example for a solved "data tourism" problem
  10. 10. 12 DESY data: Synchrotron X-ray imaging
  11. 11. 13 Data tourism Lambda: 60 Gb/s @ 2000 Hz Eiger: 30 Gb/s @ 2000 Hz 2000files/s/cam Webportalaccess IBM Spectrum Scale + Workflow rules 3D reconstruction, research calculus 2000 files/s/cam ØMQ cluster lifecycle cluster
  12. 12. 14 [Next-gen storage] Prototype wrote 50k Files/sec in one folder* -- started at 02/28/2017 12:13:13 -- mdtest-1.9.3 was launched with 14 total task(s) on 14 node(s) Command line used: /ghome/oehmes/mpi/bin/mdtest-pcmpi9131-existingdir -d /gpfs/fs2- 1m-me1/shared/mdtest-ec -i 1 -n 35000 -F -w 0 -Z -p 8 Path: /gpfs/fs2-1m-me1/shared FS: 17.1 TiB Used FS: 0.1% Inodes: 476.8 Mi Used Inodes: 0.1% 14 tasks, 490000 files SUMMARY: (of 1 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation : 50032.690 50032.690 50032.690 0.000 File stat : 3937604.341 3937604.341 3937604.341 0.000 File read : 941193.073 941193.073 941193.073 0.000 File removal : 143095.519 143095.519 143095.519 0.000 Tree creation : 77672.296 77672.296 77672.296 0.000 Tree removal : 0.239 0.239 0.239 0.000 -- finished at 02/28/2017 12:13:39 -- (*) in independent folders, the test cluster could write 2,6 Mio 32k files/sec
  13. 13. 15 More Workflow Examples
  14. 14. 16 Newly acquired evidence data:  Automatic generation of an immutable copy before the investigation  Life cycle management adjusted to investigation requirements  Life cycle management of the immutable copy fully automated (according to law) Workflow Automation: Preserving crime evidence data Workflow included + Immutability included
  15. 15. 17 Heavily used in broadcasting, but also for:  CCTV (highlighting, automatic archiving & deletion)  Medical tomography scans  Fingerprint processing (association, feature extraction, distribution)  Legal rich media document processing Workflow Automation: Handling connected documents IBM AREMAArchive and Essence Manager and many more used by
  16. 16. 18 The mother of all data projects Square Kilometre Array (SKA)
  17. 17. 19 Radio Interferometry data capture: Square Kilometre Array (SKA) …will be the world’s largest radio telescope ̶ 900 stations ̶ 300 antennas / station ̶ begin of construction planned in 2018 Substantial technological challenges ̶ 160 terabytes of raw data collected per second ̶ 1 petabyte of data stored per day ̶ 1000 petaflops per second processing power IBM's R&D involvement since 2012 ̶ Research collaboration with Astron (Dutch Institute of Radio Astronomy) ̶ Storage aspects ̶ ExaPlan: planning tool for multi-tiered exascale storage ̶ Tape library modeling and simulation ̶ Predictive cachingArtist’s rendering of the SKA
  18. 18. 20 For everyone else: Build your private cloud foundation
  19. 19. 21 S3-compatible Private Cloud as "everybody's offload storage" driven by public cloud pricing, reducing cost by enhancing storage footprint efficiency Organization-wide S3-compatible repository IBM Cloud Object Storage x86 image (contains OS) Offload snapshots Offload stale volumes IBM Spectrum Virtualize IBM Spectrum Scale Multi-vendor block storage IBM file clusters (NAS) SMB/CIFS NFS POSIX HDFS Disk TapeFlash Offload old files Offload snapshots Cloud backup IBM Spectrum Protect IBM backup Cloud backup Cloud-2-Cloud migration Systems VMs Users Archive SEC-legal retention mode + deletion hold per object$$ available as appliances
  20. 20. 22 All-or-Nothing-Transform (AONT) for safety, reliability and security 5 nines write availability, 6 nines read availability, 15+ nines reliability against data loss (3 sites) IBM Cloud Object Storage x86 image (contains OS) Geographical Information Dispersal Algorithm E.g. "encode data in 12 slices, needs 7 slices for decoding" JBOD undecipherable $$ JBOD
  21. 21. 23 How Sky avoids bottlenecks, service outages and hacking  Object access is lightweight & secure, resulting in low CPU footprint & cost browser obtains object ID (movie)
  22. 22. 24 Bonus Artificial Intelligence Research for Storage Management
  23. 23. 25 AI learns to predict ideal storage based on meta-information G. Cherubini, J. Jelitto, V. Venkatesan, “Cognitive Storage for Big Data”, Computer, April 2016
  24. 24. 26 Data Life Cycle Prediction based on experience Life cycles of different data types Prediction Quality 10% Training: 95% Success worst case (low predictable data class)
  25. 25. 27 Data Prioritization Prediction after Blackout recovery  Recovery relevance (Synchronous? Consistent? Expendable?) Prediction Quality important transactions, no loss tolerated Temp Data t R
  26. 26. ibm.biz/AxelKoester
  27. 27. 29 Quantum Computer: Nobody needs one at home Ken Olsen, founder of Digital Equipment Corporation, 1977
  28. 28. 30
  29. 29. 31 IBM Quantum Computing Scientists Hanhee Paik (left) and Sarah Sheldon (right)
  30. 30. 32
  31. 31. 33
  32. 32. 34
  33. 33. 37
  34. 34. 38 January 2018: 50 Bit
  35. 35. 39 Quantum Computer: Nobody needs one at home Search for IBM quantum experience https://quantumexperience.ng.bluemix.net/qx
  36. 36. ibm.biz/AxelKoester

×