- Big data refers to large volumes of data from various sources that is analyzed to reveal patterns, trends, and associations.
- The evolution of big data has seen it grow from just volume, velocity, and variety to also include veracity, variability, visualization, and value.
- Analyzing big data can provide hidden insights and competitive advantages for businesses by finding trends and patterns in large amounts of structured and unstructured data from multiple sources.
Evolution of BigData
ICT Business Breakfast
Durban, 17 September 2014
Willy Govender
2.
What is BigData?
âLarge volumes of a wide variety of data collected from various sources across the enterprise including transactional data from enterprise applications/databases, social media data, mobile device data, unstructured data/documents, machine-generated data and more.â Source: IDG: Big Data â Growing Trends and Emerging Opportunities
3.
Data Sources
Structured
âąSpreadsheets
âąRelational Databases
âąERP
âąCRM
âąLegacy systems
âąFile share
Unstructured
âąDocuments
âąMachine Data
âąMessaging
âąPhotographs
âąVideo
âąSocial Media
âąWeb traffic logs
"90% of all data ever created, was created in the past two years. From now on, the amount of data in the world will double every two years."
Enterprise
Cloud
4.
The Evolution ofBig Data
Big data is traditionally referred to as 3Vs (now 5V, 7V)
ïVolume (amount of data collected â terabytes/exabytes)
ïVelocity (speed/frequency at which data is collected)
ïVariety (different types of data collected)
Now experts are adding âveracity, variability, visualization, and valueâ
Big data is not new
ïSupercomputers have been collecting scientific/research data for decades
ïHowever, now its uses are being seen in commercial competitive advantages
ïAnd now we are able to collect a variety of data from multiple devices and sources
Is the evolution of the BI ecosystem from data warehousing
ïDoes not make DW obsolete
ïBig Data approaches are reducing the costs of data management
ïData still needs to be standardized, data quality maintained, and access provided to constituent communities.
ïData management will continue to be an evolutionary process.
Big data is simply a new data challenge that requires leveraging existing systems in a different way
5.
So, what doesBig Data do?
Focuses on finding hidden threads, trends, or patterns which may be invisible to the naked eye
Data store of clusters of servers (eg. Apache Hadoop used for Amazon Cloud)
A set of tasks that processes the data in different segments of the cluster then breaks down the results to more manageable chunks which are
Requires mathematical and statistical expertise as well as creative, communicative, problem-solving, and business skills summarized
Obviates the need for Data alignment or Data migration, or the requirement to move data into one place for cross-referencing. This achieved through indexes and crawlers (like Google) which constantly mine data update the indexes.
6.
Framework and DataFlows
Data Models, Structures, Types
âąData formats, non/relational, file systems, etc.
âąBig Data Management
Big Data Lifecycle (Management)
âąBig Data transformation/staging
âąRecording, Storage, Archiving
Big Data Analytics and Tools
âąBig Data Applications
âąTarget use, presentation, visualisation
Big Data Infrastructure (BDI)
âąStorage, Compute, (High Performance Computing,) Network
âąSensor network, target/actionable devices
âąBig Data Operational support
Big Data Security
âąData security in-rest, in-move, trusted processing environments
Collection and Registration
Filtering, Classification and Enrichment
Analytics, Modelling and Prediction
Presentation and Visualization
7.
What challenges canyou expect
Platforms
âąHigh end data warehousing tools
âąOpen source technologies challenging with accessing data from multiple servers rapidly in native form
âąSelection of Enterprise Search Tools
Skills
âąManaging Data Volumes
âąAbility to really understand what can be achieved
âąOpen source platforms not easy to use
âąData scientists now required
Leadership
âąNew territory for IT professionals, so planning, marketing, ROI etc is an issue
âąGetting Data on the Board's agenda
Walmart analyses real-time social media data for trend to guide online ad purchases
8.
Enterprise Search: Vendors
TCO
FEATURE SET
Low
High
Low
High
Niche Progressive
Niche Traditional
Niche Progressive
Niche Traditional
9.
Challenges in BigData
â Increasing Amount of Disorganized Data and Data Sources (structured & unstructured)
Provides greater opportunity for failure â lack of information can lead to wrong decisions
Limits productivity â more time and effort needed to find information
Frustrates search users â
information is expected to be readily available and complete
â
Not tackling Big Data in enterprises âŠ
Marketing Data
Data Warehouse
Social Media
Research Databases
Office Files
Transactional Data
Acquisition Data
â
DIGITAL DATA VOLUME
2010
2012
2014
2016
2018
2020
Etc.
10.
Opportunity in BigData
Source: IDC
35 Zetabytes
DIGITAL DATA VOLUME
2010
2012
2014
2016
2018
2020
STATUS QUO
â Accessible Data Has Value
48% CAGR1
No Specific Solutions Too hard and expensive
Homegrown
Hard to maintain and insufficient
Traditional Solutions
Waste countless months on inflexible solutions
â
Solution Types
11.
Q-Sensei Product âAimed at bringing Big Data approach to all Enterprises
â
Traditional Approaches
â Q-Sensei Revolution
âąComplex products
âąRigid delivery model
âąPre-defined usage
âąExpensive
âąLimited audience
âąExhausting implementation
âąDisparate solutions
âąPoor interaction design
âąSimple
âąPowerful
âąFast
âąFlexible
âąBroad application
âąInteractive
âąEasy delivery model
âąFor everyone
12.
Case Study mentionin Wall Street Journal in 2012
They were able to analyze traffic details for various devices, spot problem areas and add network throughput to help prepare for future demand. Netflix was also able to get more insight into the type of content customers preferred, which enabled them to make more accurate suggestions as to what subscribers might like.
13.
Case Study
â
Overview
âąPremiere Internet subscription service for streaming media and DVD-by-mail services
âąOver 50 million subscribers in 40+ countries; Revenue 2013: $4.37 billion
âąContract Management: Permission/licensing agreements with content creators
âąLeader in interactive, contextual search changing the way companies search and analyze data
âąPatented powerful multidimensional search and index capability
âąGives developers full access to award- winning technology and empowers them to built robust search and analytics applications for all data needs
World's Leading Internet television network (ITN)
14.
Case Study âSearch in Contracts
â
Goals and Key Challenges
1.Make searching their copious contract documentation better manageable and easier to use for end users
2.Integrate and unify their highly structured metadata with their unstructured content data
3.Incorporate Optical Character Recognition (OCR) of scanned documents during data ingestion process
4.Integrate with in-house, Drupal-based content management system
5.Flexibility to consume the data from their custom system
6.Data model that meets various needs of personnel
7.Timeline of only 3 month
15.
Case Study âSearch in Contracts
â Solution and Successes
1.In 3 month Q-Sensei conceptualized and deployed a solution for contract search needs using Fuse (including usability testing)
2.Addition of further capabilities based on end user feedback:
âąn-gram phrase search
âądate range search
âąmulti-sort of facets
âągrid view of results
3.The flexibility and modular architecture of Fuse enables customer to implement the platform for further use cases (knowledge base search, log analysis, usage analysis, etc.)
16.
Demo
â Q-SenseiMedical Demo
âąUnified Access to Publications, Grants, Patents, Office Files, Person
âąContent-Based Faceted Auto Complete
âąDynamic Faceting
âąSearch-within-a-search capability
âąData Interaction and deep Data Correlations
âą360-degree view of information
âąMulti-Dimensional Visualization
âąCustomizable Search Interface
âąIntegrated Data Sources (21m Publications, 1,8m Grants, 1,5m Patents, Office Files (DOC, XLS, PPT, PDF,âŠ) , Person DB )
Set-up (Harvesting, Importing, Data Transformation, Indexing) in 5 days
17.
Performance Metrics
SampleSystem
System Configuration
Performance
Based on Sample System
âąIntel Ivy Bridge Quadcore 3.4GHz
âą32GB RAM
âą1TB HD
âą64-bit Linux
âąUp to 80 million documents can be indexed
âąUp to 20 million records can be uploaded per hour (more than 5,000/sec)
âą100,000 search queries can be processed per minute per million documents; a query includes:
âąprocessing of search expression (including fulltext)
âącomputation of eight (8) standard facets
(Latest test: September 2013)
18.
Contract Management Search
âąCreate a more accurate and efficient contract search by exposing all metadata and using facets
âąSearch scanned documents with advanced OCR capabilities Knowledge Base / Support Center Search
âąIncrease the efficiency of finding answers by utilizing more metadata in your knowledge base
âąEmbrace tags and faceted search over hierarchy to find answers more quickly Enterprise Search
âąUnify your companyâs information by searching all sources simultaneously
âąIncrease the productivity of everyone with better data accessibility
Usage Analysis
âąIncrease speed and agility of customer activity analysis by embracing a multidimensional view of your data
âąDrive dynamic visualizations and build complex queries Structured Data Analysis
âąUnderstand the composition of data, find relationships, and identify trends
âąView data more accurately by analyzing all attributes simultaneously E-Commerce Faceted Navigation
âąMore accurately represent your products with dynamically updating facets that perform at scale
âąPower more meaningful recommendations with the capability to use more metadata
Further Use Cases
â
A Single Platform for Everything