Integrating big data technologies in enterprise it nasscom
Upcoming SlideShare
Loading in...5
×
 

Integrating big data technologies in enterprise it nasscom

on

  • 1,681 views

Integrating big data technologies in enterprise it Nasscom

Integrating big data technologies in enterprise it Nasscom

Statistics

Views

Total Views
1,681
Views on SlideShare
1,107
Embed Views
574

Actions

Likes
1
Downloads
36
Comments
0

2 Embeds 574

http://www.nasscom.in 344
http://www.nasscom.org 230

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Integrating big data technologies in enterprise it nasscom Integrating big data technologies in enterprise it nasscom Presentation Transcript

  • Integrating Big Data Technologies in Your IT Portfolio Vineet TyagiVP Technology, Head of Innovation Labs vineet.tyagi@impetus.co.in vineet.tyagi1@gmail.com blogs.impetus.com Impetus Technologies Inc.
  • Outline  Big Data  Big Data Technologies and the Ecosystem  Transforming your Enterprise Data Warehouse to a Big Data Warehouse  Cost considerations  Cloud considerations  Operational Support  People Aspect of Big Data  Use Case Selection – what where  Q&A’sImpetus Proprietary 2
  • Big Data 2.5 quintillion bytes is produced every day 2.5 QUINTILLION big data cost IDC/EMC $6 TRILLION $650 Billion / year: cost of wasted productivity due $650 BILLION to Information overload. 1 Zettabyte: Estimated Internet Traffic by 2015 1ZB 1,800 Exabytes: Size of the digital universe in 1800EB 2011 90% of the data in the world today is less than two 90% years old 18 Months is the estimated time for the digital 18 universe to doubleImpetus Proprietary 3
  • Big Data  Not only the original content stored or being consumed but also about the information around its consumption  airline jet collects 10 terabytes of sensor data for every 30 minutes of flying time  New York Stock Exchange collects 1 terabyte of structured trading data per day  Big Data Is Not the Created Content, nor Is It Even Its Consumption — It Is the Analysis of All the Data Surrounding or Swirling Around ItImpetus Proprietary 4
  • Age of Data Age of Software Age of Data Data Rich and Information PoorImpetus Proprietary 5
  • Big Data - Technologies  There will be More  content  devices  applications  On - Demand Access  Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high- velocity capture, discovery, and/or analysis Finding answers where there are yet to be questionsImpetus Proprietary 6
  • Big Data - Technologies Programmable Hardware Hadoop Ecosystem Innovation of Mixware architectures Hadoop, Map-Reduce NVIDIA GPU HBase CUDA Library Hive, Sqoop, Flume etc GPGPU PIG, PIG Latin OpenCL standards MapR, Platform Computing, Pervasive SIMD architecture based parallel programming DataRush BIG SQL NOSQL GRID Frameworks Alchemi VoltDB Membase Proactive Hadapt Cassandra JPPF Clustrix Mongo DB GridGain Xeround Hypertable Distributed computation CouchDB Microsoft HPC Server •BIG SQL BIG SQL Learning Machine Storage Optimization Apache Mahout, R, Weka, Orange Voldemort, Oozie, Datameer Rainstor TeradataImpetus Confidential 7
  • EDW to Big Data Warehouse - Drivers  BUSINESS DRIVERS  Better insight  Faster turn-around  Accuracy and timeliness  IT DRIVERS  Reduce storage cost  Reduce data movement  Faster time-to-market  Standardized toolset  Ease of management and operation  Security and governanceImpetus Proprietary 8
  • EDW to Big Data Warehouse  Traditional enterprise data models for application, database, and storage resources have grown over the years  cost and complexity of these models has increased along the way to meet the needs of larger sets of data  new models are based on a scaled-out, shared-nothing architecture  One size no longer fits all  Big Data Components  Hadoop: Provides storage capability through a distributed, shared- nothing file system, and analysis capability through MapReduce  NoSQL: Provides the capability to capture, read, and update, in real time, the large influx of unstructured data and data without schemas; examples include click streams, social media, log files, event data, mobility trends, and sensor and machine dataImpetus Proprietary 9
  • EDW to Big Data WarehouseImpetus Proprietary 10
  • Cost Considerations of Big Data Warehouse  Initial Entry Costs  Cost of Experimentation  Cost of Integration and Moving Data  Cost of ETL  Query and analytics capability  Manageability  On-Going Maintenance  Monitoring and Tuning  Changing Capacity  Additional Hardware  Cost of ComplianceImpetus Proprietary 11
  • Lowering TCO of Big Data  Initial Entry Costs  Cost of Experimentation – Best Practice Patterns, learn or hire  Cost of Integration and Moving Data  Cost of ETL – Remove costly licensed tools, switch to MR for ETL or ELT  Manageability  Provisioning, management tools – You will have more than single vendor, look for multi-vendor management toolsets like Ankush from Impetus  On-Going Maintenance  Monitoring and Tuning – Automate Automate Automate  Changing Capacity  Additional Hardware – Do you know the GPU?Impetus Proprietary 12
  • Lowering TCO of Big Data  Cost of Storage  Compress Data – Rainstor type solutions  Do More with Less  Faster MR – MapR type solutions  Acunu type solutions for NoSQLImpetus Proprietary 13
  • Cloud Considerations in Big Data Warehouse  More ―virtual‖ servers shipped than ―physical‖ servers in 2011  20% of all information running through servers by 2015 will be doing so on virtualized systems  The challenges for cloud adoption include  Data preparation for conversion to cloud  Integrated cloud / non-cloud management  Service-level agreements and termination strategies  Security, backup, archiving, and disaster control strategies  Intercountry data transfer and complianceImpetus Proprietary 14
  • Operational Support Considerations in Big Data Warehouse  Big Data solution architectures  Technology churn. Hadoop is the only constant as a paradigm  Impedance Mismatch : Is your IT organization geared up to transition Big Data technologies into the Enterprise?  Un Solved Challenges  Rapidly, automatically or rule based single click provisioning of Big Data Clusters  Measure the boost provided by Clusters/Grids to your business data processing capabilities.  Need to change your choice of cluster software at any point of time when you feel that it is not sufficiently delivering to your needs  Manage big data solution from a single cluster management software umbrella Mutli – Vendor Multi – Technology Cluster ManagementImpetus Proprietary 15
  • People Aspect : Big Data Warehouse  New Skillsets Needed  Data Scientists  Developers who can think Parallel  New Definitions for older roles  Big Data Administrator ?  Invest in TrainingImpetus Proprietary 16
  • Use Case Selection : Big Data ScenariosBig Input- Small Input- Big Input-Small Output Big Output Big Output
  • Stage 1 Use Case : Introduce Big Data 18
  • Stage 2 Use Case : Get Bolder 19
  • Stage 3 Use Case : Sophisticated 20
  • Impetus  We offer Innovative Product Engineering & Technology R&D Services and Products  Eighteen years of experience, numerous award winning products and success stories  Innovation based differentiated services  1300+ engineers, development centers in India  Pioneers in Big Data Consulting Services  Since 2008, 10+ active in-production use cases at large Fortune 100  Products and Tools to help ease Big Data adoption  Ankush, Jumbune, iLaDaP 21
  • Impetus OpenSource Contributions  Kundera (http://code.google.com/p/kundera/): This is an annotation-based Java library for NoSQL databases like Cassandra, Hbase & MongoDB.  Hadoop (http://hadoop-toolkit.googlecode.com): Hadoop Performance Monitoring tool provides an inbuilt solution with an Suggestion Engine for quickly finding performance bottlenecks. A visual representation helps suggest remediation.  Korus (http://code.google.com/p/korus/): This is a parallel and distributed programming framework, which improves the performance and scalability of Java applications. 22
  • Thank You