Big Data


Published on

Big Data-esitys 09.02.2012

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Big Data

  1. 1. Big Data Eufris 2012
  2. 2. Why should I care?McKinsey:•$250 billions annual savings in EU alone by enhancing public sector•$600 billions annual consumer surplus from using personal location data globally•Annual growth of data is remarcable•Data is the most valuable thing most companies have•Data is massively underutilized Eufris 2012
  3. 3. ForecastThere will be a shortage of talent necessary fororganizations to take advantage of big data. By 2018, theUnited States alone could face a shortage of 140,000 to190,000 people with deep analytical skills as well as 1.5million managers and analysts with the know-how to usethe analysis of big data to make effective decisions. Eufris 2012
  4. 4. What is Big Data?"Big data technologies describe a new generation of technologies and architectures, designed toeconomically extract value from very large volumes of a wide variety of data, by enabling high-velocitycapture, discovery, and/or analysis"IDC"Big Data is a technlogy that helps extract value from the digital universe.”IDC"Techniques and technologies that make handling data at extreme scale economical."Forrester Eufris 2012
  5. 5. ABC of Big Data Analy&cs •making  sense  of  your  data,  in  real-­‐5me,  in  easy  way Bandwidth •inges5ng,  prosessing  and  delivering  large  amounts  of  data Content •storing,  managing  and  retaining  large  amounts  of Eufris 2012
  6. 6. 3 V’s of Big DataVariety • Big  Data  extends  beyond  structured  data,  including  unstructured  data  of  all  varie5es:   text,  audio,  video,  click  streams,  log  files  and  moreVelocity • o@en  5me  sensi5ve,  Big  Data  must  be  used  as  it  is  streaming  in  to  the  enterprise  in  order   to  maximize  its  value  to  the  businessVolume • Big  Data  comes  in  one  size:  large.  Enterprises  are  awash  with  data,  easily  amassing   terabytes  and  even  petabytes  of  informa5on Eufris 2012
  7. 7. Few core concepts Eufris 2012
  8. 8. Hadoop•The  Apache  Hadoop  so.ware  library  is  a  framework  that   allows  for  the  distributed  processing  of  large  data  sets  across   clusters  of  computers  using  a  simple  programming  model.•Three  subprojects •Hadoop  Common •Hadoop  Distributed  Filesystem  (HDFS) •Hadoop  MapReduce Eufris 2012
  9. 9. MapReduce•Introduced  by  Google  in  2004 2 2 Map 2 Reduce 3 4 1 5 2 3 Eufris 2012
  10. 10. MapReduce on App Engine • Mapreduce  is  an  experimental,  innovaNve,  and  rapidly  changing  new   feature  for  App  Engine Eufris 2012
  11. 11. NoSQL•DefiniNon  1 “Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent, a huge data amount, and more.” Eufris 2012
  12. 12. NoSQL•DefiniNon  2 “In computing, NoSQL (sometimes expanded to "not only SQL") is a broad class of database management systems that differ from the classic model of the relational database management system (RDBMS) in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally.” Wikipedia Eufris 2012
  13. 13. From ACID to BASEACID:Atomicity,  Consistency,  Isola&on,  DurabilityBASE:Basically  available,  So?  state,  Eventually  consistent Eufris 2012
  14. 14. Big Data and cloud Eufris 2012
  15. 15. Big Data on AWS Eufris 2012
  16. 16. MapReduce on AWS• Not  yet  Hadoop  1.0.0 Eufris 2012
  17. 17. MapReduce on AWS EC2 S3 + DynamoDB Eufris 2012
  18. 18. Google BigQuery Features• Speed - Analyze billions of rows(!) in seconds• Scale - Terabytes of data, trillions of records• Simplicity - SQL-like query language, hosted on Google infrastructure• Sharing - Powerful group- and user-based permissions using Google accounts• Security - Secure SSL access• Multiple access methods - Can be used by REST API, a command-line tool, a browser-based graphical interface, and Google Apps Script Eufris 2012
  19. 19. BigQuery example Eufris 2012
  20. 20. Big Data outside of cloud Eufris 2012
  21. 21. Oracle Big Data ApplianceAbout 500 000 $18 Oracle Sun Servers • 864 GB main memory; • 216 CPU cores; • 648 TB of raw disk storage; • 40 Gb/s InfiniBand connectivity between nodes and engineered systems; • 10 Gb/s Ethernet connectivity. Eufris 2012
  22. 22. Autonomy IDOL 10"For far too long, organizations have confined structured data torelational databases and unstructured data to simplistic keywordmatching technologies..."“IDOL 10 brings these worlds together, allowing organizations toautomatically process, understand, and act on 100 percent oftheir data, in real-time. The results will be dramatic, asbusinesses can develop entirely new applications that explorethe richness and color of Human Information that live inunstructured, semi-structured, and structured forms.”Price? Eufris 2012
  23. 23. Thank you! Eufris 2012