Information Management andAnalyticsAKA Discussion Papers                    February 2012
Challenges and opportunities in gaining advantageand leverage through data   Companies today are evolving into virtual ne...
Big Data Volume Statistics and Predictions       Digital Storage Acquisition in zettabytes                                ...
What is Big Data? Where Does it Come From?    Big Data includes both internal AND external content. Not all data must res...
The search challenge with unstructured data:Data Science  % of Relevant Data that are Returned                            ...
How to Reveal the Content in Big Data and Determineits Relevance and Confidence.   Sentiment analysis, also called text a...
The Value of Big Data    Data Science: To Support or To Drive?           Perform analysis & exploration of Big Data.    ...
Big Data Architecture   Non-relational distributed file system. Can Augment existing systems.   Provides the ability to ...
Big Data Management   As with all forms of data, a critical aspect of getting value out of big data is data    management...
Upcoming SlideShare
Loading in …5
×

Information Management and Analytics

691 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
691
On SlideShare
0
From Embeds
0
Number of Embeds
45
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Information Management and Analytics

  1. 1. Information Management andAnalyticsAKA Discussion Papers February 2012
  2. 2. Challenges and opportunities in gaining advantageand leverage through data Companies today are evolving into virtual networks of permanent and transient teams of people. ̶ Enterprises today can garner competitive operating advantage by leveraging social, local and mobile technology to generate leverage through individuals ̶ This leverage comes through the application of targeted, specific data at the point and time of informational advantage Commonly used information architectures do not address delivery, collaboration and interchange of ALL-types of information across networks of people as a core principle. ̶ Knowledge workers create, analyze, manage, decide, evaluate, and synthesize information of all types as their dominant activity throughout the enterprise. Solving the Right Problems – Companies must address two fundamental activities that intersect their daily routine: ̶ Collaboration, communication and information sharing ̶ Making sense of information - separating noise from the constant stream 2
  3. 3. Big Data Volume Statistics and Predictions Digital Storage Acquisition in zettabytes IDC: Universal Digital Data Explosion Study 8 zb A years worth of data generated in the 90’s is created within 1 minute in 2011 1.8 zb 0.13 zb 1990 2005 2010 2015Gartner: Unstructured data alone will explode to 650% its present volume by 2017. Are you positioned to take advantage of the big data predictions? 3
  4. 4. What is Big Data? Where Does it Come From? Big Data includes both internal AND external content. Not all data must reside internally for analysis Data is organized and managed by its type of structure Type of Data Structured Semi-Structured Unstructured Short Definition Strictly meets its Has a structure but Has little to no object definition may differ greatly structure and not between files easily read by a machine Examples Relational, Flat File, Excel, Word, xml, Pdf, xray, legal web services, … html, tweets, documents, video, email,… im Big Data is everywhere: Search engines, Instant Messaging, Social Media, Legal documents and Contracts, Medical Records and test/scan outcomes, Digital Media, Internal unstructured documents, stock tickers, press releases, et al. 4
  5. 5. The search challenge with unstructured data:Data Science % of Relevant Data that are Returned Inefficient Optimal Worst Incomplete % of Returned Data that are Relevant Source - Brewster Kahle 5
  6. 6. How to Reveal the Content in Big Data and Determineits Relevance and Confidence. Sentiment analysis, also called text analytics, provides the ability to filter big data to determine its relevance. (Social Media, Search engines, et al) Happy Capture Sentiment Unhappy Tweets on Analysis Brand X Need Help Textual ETL breaks down content to its granular information using taxonomies and ontologies. (pdf, doc, swift, et al)For Unstructured: For Semi-structured: - stop word processing - textual structure mapping - stemming - variable pattern recognition - alternate spelling - variable symbol recognition - synonym concatenation - multiple index type support - homograph resolution - utilities including: - spell checking - raw data hidden character display - word and phrase proximity - multiple path processing - final index trimming 6
  7. 7. The Value of Big Data Data Science: To Support or To Drive?  Perform analysis & exploration of Big Data.  Analyze RAW and/or integrated data, remove ‘noise’, mine for peaks and valleys, determine relevance and exploit the data for predictive analysis. ROIi Top Level: Integrate and enrich with External Data ̶ Predictive Analysis Integrated and & Exploration Big Data Utilization Predictive Analysis – RAW Internal & Reports Drive the Business External Data Mid Level: Integrate and enhance proprietary Informed Integrated data. Decisions/Insights – Internal Data & ̶ BI Reports Enhanced Support Purchased External data Bottom Level: Support operational systems. Internal Operate & Support ̶ Operational Reports Proprietary Business Data 7
  8. 8. Big Data Architecture Non-relational distributed file system. Can Augment existing systems. Provides the ability to internalize Optimal big data while continuing to access and report on external data to position for predictive analysis. Can use open source: Hadoop, Clojure, Storm, et al. and/or an enterprise level vendor to manage/monitor and support such as Teradata, Greeplum, Neteeza, Exadata, etal. Scalable and Extensible solution MPP (Massive Parallel Processing) reduces query response and acquisition time. Capable of handling RAW data. Additional benefits: ̶ increased IT agility in meeting business requirements ̶ Softens the brittleness of the data models ̶ Ability for Real time analysis ̶ Positions BI for next generation architecture 8
  9. 9. Big Data Management As with all forms of data, a critical aspect of getting value out of big data is data management best practices. Data Management practices include: ̶ Data Quality & Discovery ̶ Relationship or linking algorhythyms ̶ Data Governance ̶ Confidence levels and status codes ̶ Metadata management Information available about the data should include: ̶ Where did the data point come from? ̶ What type of cleansing/linkage or modification was performed? ̶ When did this data arrive? ̶ What is the temperature of the data? ̶ Who are the consumers of the data? ̶ When is the data required? ̶ What is the value of the data? ̶ What is it linked to? 9

×