More Related Content
Similar to Does Big Data Spell Big Costs- Impetus Webinar (20)
More from Impetus Technologies (20)
Does Big Data Spell Big Costs- Impetus Webinar
- 1. Impetus Technologies Inc.
Does Big Data Spell Big Costs?
1 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 2. Outline
• Big Data – Current Scenario
• Cost components in a Big Data Warehouse
• Best Practices - Reducing the cost of Big Data solutions
– Cost of storage
– Technologies- What and Where?
– Big Data strategies
– Our recommendations to reduce TCO
2 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 3. Big Data – Current Scenario
2.5 Quintillion Bytes produced every day
$6 Trillion Big data cost IDC/EMC
$650 Billion
3 © 2014 Impetus Technologies
Cost of wasted productivity because of information
overload
1ZB Estimated Internet Traffic by 2015
1800EB Size of the digital universe in 2011
90%
90% of the data in the world today has been created in
the last two years alone
18 Months Estimated time for the digital universe to double
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 4. Age of Data
Age of Software Age of Data
4 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 6. Using Commodity H/w for Big Data
Commodity Hardware
• Pros
– Build your own
– The promise of innovation
• Cons
– Building reliable storage – $1 per GB
– Add the cost of managing / monitoring / hosting
6 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 7. Using Open Source & Cloud Computing
Open Source
– Pros
• Software is free !! Glory to the Elephant
– Cons
• Cost of Training – thinking parallel is not intuitive
• Cost of Support – support is not free
Cloud Computing
– Pros
• Rent what you need
– Cons
• $14,000 a month for 100 TB data – storage only
7 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 8. Big Data Warehouse- Cost Components
• Initial entry costs- Cost of experimentation
• Cost of integration and moving data - Cost of ETL
• Query and analytics capability
• Manageability
• On-going maintenance - Monitoring and tuning
• Changing capacity - Additional hardware
• Cost of compliance
8 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 9. Lowering TCO of Big Data
• Hardware
– Lower cost of storage
– Lower cost of computation
• Software
– Make things faster
– Do more with less
9 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 10. How to reduce the cost of storage?
– Compress – RainStor and similar solutions
• Just make sure your ‘Read Throughput’ is high
– Retain all v/s load & process
• Setup “data pipelines” or use ILM Principles
• Creation and Receipt
• Distribution
• Use
• Maintenance
• Disposition
– Focus on Big Data but don’t forget the “Small Data”
10 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 11. Technologies: What and Where?
What?
• Open Source vs. Commercial software?
• Specialized hardware/appliances vs. commodity
hardware?
• Vendor lock-in vs. vendor independence?
• Cost of latencies?
• Cloud?
Where?
• OLTP - NoSQL v/s OLAP - DW (MapReduce & MPP)
11 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 12. OLAP: Big Data Scenarios
12 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 13. Data Tapping Point, Cost & Latency
13 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 14. Indirect Analytics over Hadoop
14 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 15. Direct Analytics over Hadoop
15 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 16. Analytics over Hadoop with MPP DW
16 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 17. Selecting the Right Technology
Key considerations
– $ per TB
– Business Continuity/ Cost/ Vendor Lock-in
– Latency Needs
17 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 18. Choosing MPP
$ per TB Driven
– EMC Greenplum
– Teradata, Aster
– HP Vertica
– Oracle Exadata
– Netezza
– ParAccel
– Others
18 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 19. Faster Map Reduce & Hadoop
Business Continuity/ Cost/ Vendor Lock-in
– MapR
– HPCC
– Hadapt
– Pervasive DataRush, HStreaming
– Cloud Map Reduce
– DataStax
– Platform Computing
– MARS, GPMR
– ParStream
19 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 20. OLTP: NoSQL Solutions
Latency Needs
• Column stores
– HBase, Cassandra
• Documents stores
– MongoDB, CouchDB
• Key stores
– Redis, Riak etc.; Kyoto Cabinet/Tokyo Tyrant, Berkley
• GraphDB
– Neo4j
• Cloud stores
– SimpleDB
20 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 21. OLTP: New Era RDBMS Version
• Postgres, InfiniDB, Infobright
• MySQL Cluster
• GridSQL, EnterpriseDB
• MS SQL
• Sybase IQ
• Specialized stores
– VoltDB, MarkLogic, Clustrix
• Xeround
• ParStream
• Oracle NoSQL
21 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 22. Recommendations- Cost Components of Big
Data Warehouse
• Initial Entry Costs - Cost of Experimentation
We recommend – Follow Best Practices , Learn or Hire
• Cost of Integration and Moving Data- Cost of ETL
We recommend - Remove costly licensed tools, switch to Map Reduce
for ETL or ELT
• Manageability - Provisioning, management tools
We recommend – Opt for multi-vendor management toolsets,
e.g. Impetus Ankush
• On-Going Maintenance- Monitoring and Tuning
We recommend – Automate! Automate! Automate!
• Changing Capacity - Additional Hardware
Do you know the GPU?
22 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 23. Recommendations- Hardware & Software
• Cost of Storage- Compress Data
We recommend – Opting for RainStor/ similar solutions
• Do More with Less - Faster MR
We recommend – MapR/ similar solutions
– Acunu and related solutions for NoSQL
23 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 24. 24 © 2014 Impetus Technologies
About Impetus
- 25. • Strategic partners for software product engineering and
R&D
• Thought leaders in cutting-edge technologies
• Mature processes and practices that are methodical, yet
flexible
• Diverse domain expertise
25 © 2014 Impetus Technologies
Our Services in Big Data and Analytics
Expert Consulting
Proof of Concept & Implementation
Support Services
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 26. Big Data Quick Start Program
Three Modules
• Gear up (1 day session)
• Base Camp (4 day session)
• Summit (5 day session)
26 © 2014 Impetus Technologies
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56
- 28. 28 © 2014 Impetus Technologies
Thank You
Write to us at inquiry@impetus.com
Follow us on Twitter @impetustech
Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=56