copyright @Sixth Sense Advisors Inc 2012 6Future of Data
copyright: Sixth Sense Advisors Inc @2012 7 Big DataBig Data can be defined as data that can grow in volume, velocity, variety and complexity atunprecedented pace. The growth and complexity present challenges with the capture, storage,management, analysis and visualization using the typical BI tool stack
copyright: Sixth Sense Advisors Inc @2012 8 Tapping into the data Business Infrastructure Today we do Big or SmallStructured data compute with Small andused today Large structured data setsBig Data Big Data will mean Big orexisting across Small compute with Bigthe enterprise data sets, not alwaysthat can be available in structured ormade available semi-structured formatsto business
copyright: Sixth Sense Advisors Inc @2012 9Analytics• Analytics is the key visualization technique to analyze and monetize from Big Data• The field of analytics is resurging from the advent of Big Data • Social Analytics • Sensor Analytics • Text Analytics • Deep Data Mining• Analytics needs metadata for integration• Applications • Fraud Detection • Campaign Optimization • Demand and Supply Optimization • Forecast Optimization
copyright: Sixth Sense Advisors Inc @2012Long Tail The New Way (with a bigger, longer tail) The Old Way(Pareto Principle, Control or 80/20 rule) Source: http://en.wikipedia.org/wiki/The_Long_Tail 20% When Web 2.0 is applied…
copyright: Sixth Sense Advisors Inc @20122008 US Presidential Elections $32 million raised from 275,000 people who gave $100 or less
copyright: Sixth Sense Advisors Inc @2012 Long Tail Example Web 2.0 significantly increases total value contributed/received by aggregating the long tail of smaller value donors.High $ value donors, Smallconstellation Source: http://en.wikipedia.org/wiki/The_Long_Tail 20% Low $ value donors, Larger constellation BIG Data
copyright: Sixth Sense Advisors Inc @2012Brand Management
copyright: Sixth Sense Advisors Inc @2012 15 What do we collect• Facebook has an average of 30 billion pieces of content added every month• YouTube receives 24hours of video, every minute• 5 Billion mobile phones in use in 2010• A leading retailer in the UK collects 1.5 billion pieces of information to adjust prices and promotions• Amazon.com: 30% of sales is out of its recommendation engine• A Boeing Jet Engine produces 20TB/Hour for engineers to examine in real time to make improvements
copyright: Sixth Sense Advisors Inc @2012 17 Why DWBI Fails Repeatedly Lost value =Business Value Sum (Latencies)+ Business Situation Opportunity Cost Data LatencyValueLost Data is ready Analysis Latency Information is available Decision Latency Decision is made Action time or Action distance TimeBase Graph Courtesy – Dr. Richard Hackathorn
copyright: Sixth Sense Advisors Inc @2012 18 The Data Landscape DatamartsTransactional Reports Systems ODS & Analytical Databases Dashboar Enterprise ds Datawarehous DatamartsTransactional Systems ODS e & Analytical Databases Analytic Models OtherTransactional Applicatio ODS Datamarts ns Systems & Analytical Databases Data Transformation
copyright: Sixth Sense Advisors Inc @2012 19ACID Kills• Atomic – All of the work in a transaction completes (commit) or none of it completes• Consistent – A transaction transforms the database from one consistent state to another consistent state. Consistency is defined in terms of constraints.• Isolated – The results of any changes made during a transaction are not visible until the transaction has committed.• Durable – The results of a committed transaction survive failures
copyright: Sixth Sense Advisors Inc @2012 20 BIG Data Scenarios EXAMPLES To: Bob.Collins@bankwithus.comDear Mr. Collins,This email is in reference to my bank account which hasbeen efficiently handled by your bank for more than fiveyears. There has been no problem till date until last weekthe situation went out of the hand.I have deposited one of my high amount cheque to mybank account no: 65656512 which was to be creditedsame day but due to your staff carelessness it wasn’tdone and because of this negligence my reputation in themarket has been tarnished. Furthermore I had issued onepayment cheque to the party which was showingbounced due to “Insufficient balance” just because mycheque didn’t make on time.My relationship with your bank has matured with the timeand it’s a shame to tell you about this kind of services arenot acceptable when it is question of somebody’sreputation. I hope you got my point and I am attaching acopy of the same for further rapid procedures and remitinto my account in a day.Yours sincerelyDaniel CarterPh: 564-009-2311
copyright: Sixth Sense Advisors Inc @2012 21 BIG Data Text Example • We will o9en imply addi>onal informa>on in spoken language by the way we place stress on words. • The sentence "I never said she stole my money" demonstrates the importance stress can play in a sentence, and thus the inherent diﬃculty a natural language processor can have in parsing it. • "I never said she stole my money" -‐ Someone else said it, but I didnt. • "I never said she stole my money" -‐ I simply didnt ever say it. • "I never said she stole my money" -‐ I might have implied it in some way, but I never explicitly said it. • "I never said she stole my money" -‐ I said someone took it; I didnt say it was she. • "I never said she stole my money" -‐ I just said she probably borrowed it. • "I never said she stole my money" -‐ I said she stole someone elses money. • "I never said she stole my money" -‐ I said she stole something, but not my money • Depending on which word the speaker places the stress, this sentence could have several dis>nct meanings. Example Source: Wikepedia
copyright: Sixth Sense Advisors Inc @2012 22 Pattern DetectionClustering Techniques Utilities K-Means Accuracy Measures Maximin Range Filters Agglomerative K-Fold Cross Validation Divisive Merge & Subset Regression Vector MagnitudeClassification Techniques Native Bayes Examples Neural Networks • Text – OCR, Machine, Digital Back Propogational • Face recognition, verification, retrieval. Recursively Splitting • Finger prints recognition. K-Nearest Neighbor • Speech recognition. Minimum Distance • Medical diagnosis: X-Ray, EKG analysis • Machine diagnostics dataReduction Techniques • Geological data Backward Elimination • Automated Target Recognition (ATR). Forward Selection • Image segmentation and analysis (recognition Attribute Removal from aerial or satelite photographs). Principal Components
@2012 Copyright Sixth Sense Advisors 24The Normal Way Results In ……..
copyright: Sixth Sense Advisors Inc @2012 25 PerformanceRe-Engineering a Ferrari Engine in a Yugo does not make the fastestrace car. + New Data Types Current Data + New volume • POOR Management + New Analytics Performance Platform • Failed + New Data Retention(RDBMS + ETL Programs +BI) + New Data Workloads
copyright: Sixth Sense Advisors Inc @2012 26 Big Data and You • You need to write data quickly and reliably • Incoming data streams are diﬀerent in type, size, complexity • But wri>ng it to disk or memory is not the ul>mate goal • You need to validate data in real-‐>me • You need to count and aggregate as your write • You need to analyze in real-‐>me as later even if seconds later is historical • You need to scale-‐up and scale-‐out on demand
copyright: Sixth Sense Advisors Inc @2012 28 Data Warehouse ApplianceHigh Availability • A Data Warehouse (DW) Appliance is an integratedStandard SQL Interface set of servers, storage, OS, database andAdvanced Compression interconnect specifically preconfigured and tunedMPP for the rigors of data warehousing.Leverages existing BI, ETL and OLTP investments • DW appliances offer anHadoop & MapReduce Interface / Embedded attractive price / performance valueMinimal disk I/O bottleneck; simultaneously load & query proposition and are frequently a fraction of theAuto Database Management cost of traditional data warehouse solutions.
copyright: Sixth Sense Advisors Inc @2012 31Hadoop & RDBMS Analogy RDBMS Hadoop Sports car: Cargo train: • refined • rough • has a lot of features • missing a lot of luxury • accelerates very fast • slow to accelerate • pricey • carries almost anything • expensive to maintain • moves a lot of stuff very efficiently* Original Slide Author- Amr Adwallah , CloudEra
copyright: Sixth Sense Advisors Inc @2012 38Map Reducen Technique for indexing and searching large data volumesn Two Phases, Map and Reduce n Map n Extract sets of Key-Value pairs from underlying data n Potentially in Parallel on multiple machines n Reduce n Merge and sort sets of Key-Value pairs n Results may be useful for other searches
copyright: Sixth Sense Advisors Inc @2012 45 Big Data Challenges• Integration to the EDW is still an open issue – Big Data reduces to small metrics, and this translates into the current state issues faced with EDW data• Big Data requires lot of Taxonomy processing especially in Content related Search• There are several applications that need high performing memory architectures as data is compute intensive – example image processing of brain scans• Technology is improving by the day, but integration and deployment are becoming equally complex.