Big Data & Oracle Technologies

915 views

Published on

Published in: Data & Analytics, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
915
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • http://www.slideshare.net/cloudera/20100806-cloudera-10-hadoopable-problems-webinar-4931616
  • http://www.youtube.com/watch?v=sANatTx87r4
  • Big Data & Oracle Technologies

    1. 1. BIG DATA & ORACLE TECHNOLOGIES KIEV OCT 2013 PRACTIC CONSULTING Alliance of Professional IT & Management Consultants HTTP://PRACTIC-CONSULTING.COM
    2. 2. Agenda • ABOUT BIG DATAWHAT • INDUSTRY EXAMPLES OF BIG DATAWHEN • ORACLE NO SQL • ORACLE R • ORACLE ENDECA HOW
    3. 3. WHAT IS BIG DATA? PART I
    4. 4. What Is Big Data? Big Data – is data that becomes large enough that it cannot be processed using conventional methods Big Data – is the new generation of data warehousing and business analysis systems 010101101010100101010101010101010010101010100101010101001010101010010101010101010100101 010100101010101010101001010101010010101010101010010101001010101001010101001010101001010 101010101001010101010100101010100101010100101010100101010101001010101001010101001010101 010010101010100101010010101001010101001010101001010100101010010101010010101010010101010 010101010010101001010101010101010101010101010010101010010101010010101010010101010010101 010010101010010101010010101010010101010010101010010101010010100101010100101010100101010
    5. 5. A Wider Variety of Data Internet Data  Clickstream  Social media  Social media stream  Web site logs Research Data  Experiments  Observations  Surveys  Marketplace data Healthcare Data  Treatment data  Telehealth  National Electronic Health Records  Procedures Image Data  Image  Video  Satellite image  Surveillance Device Data  RF Devices  Sensors  EDI  Telemetry
    6. 6. Why Is Big Data Important? Big Data - Just another buzzword or powerful business & science enabler? SQL Analytics • Count • Mean • OLAP Descriptive Analytics • Univariate distribution • Central tendency • Dispersion Data Mining • Association rules • Clustering • Feature extraction Predictive Analytics • Classification • Regression • Forecasting • Spatial • Machine Learning • Text Analytics Simulation • Monte Carlo • Agent-based modeling • Discrete event modeling Optimizatio n • Linear Optimization • Non-Linear Optimization Business Intelligence Advanced Analytics
    7. 7. INDUSTRY EXAMPLES OF BIG DATA PART II
    8. 8. Marketing & Sales + Big Data TO DELIVER AN ANSWER 100 milliseconds COUNT OF ADS 100,000 per SECOND http://www.dataxu.com/ ADVERTISING PLATFORM Clickstream, Behavior
    9. 9. Retail + Big Data CAPTURE 1,000 tweets per SECONDS INCREASE OF DATA +10 TB per DAY http://www.walmart.com/ WAL-MART ONLINE MARKETING Social Media
    10. 10. Health Care + Big Data INCREASE OF DATA EACH MONTH +10 TB PATIENTS INVOLVED 10,000 https://cghub.ucsc.edu/index.html/ CANCER GENOMICS HUB DNA and RNA data
    11. 11. Science + Big Data SEVEN TELESCOPES CAPTURE 2 MB per SECOND IN NEXT 10-15 YEARS ALL TELESCOPES WILL RECEIVE 30 TB per SECOND http://www.skatelescope.org/ THE CATALOG OF UNIVERSE Data from Telescope
    12. 12. ORACLE TECHNOLOGIES PART III
    13. 13. Oracle NoSQL Hadoop Distributed File System (HDFS) Oracle NoSQL Database File System Database Parallel scanning Indexed storage No inherent structure Simple data structure High volume writes High volume random reads and writes Batch Oriented Real-Time Big Data Storage Choices
    14. 14. Oracle NoSQL • RDBMS – High value, high density, complex data – Complex data relationships – Schema-centric – Designed to scale up & out – Lots of general purpose features/functionality  High overhead ($ per operation) • NoSQL architectures – Low value, low density, simple data – Very simple relationships – Schema-free, unstructured or semi-structured data – Distributed storage and processing – Stripped down, special purpose data store  Lower overhead ($ per operation)
    15. 15. Oracle NoSQL Simple Data Model Small, distributed footprint Highly scalable, available Transparent load balancing Integrates with Oracle Stack Application Storage Nodes Datacenter B Storage Nodes Datacenter A NoSQL Database Driver Application NoSQL Database Driver A Distributed, Scalable Key-Value Database
    16. 16. Oracle NoSQL Key-value pairs • Simple data model – key-value pair (major+minor-key paradigm) • Simple operations – read/insert/update/delete, RMW support • Scope of transaction – records within a major key, single API call • Unordered scan of all data (non-transactional) userid addresssubscriptions email idphone #expiration date Major key: Sub key: Value: Strings Byte Array 
    17. 17. Oracle NoSQL On Line Display Advertising
    18. 18. Oracle NoSQL Getting Started with Oracle NoSQL DB 1. Download from OTN: www.oracle.com/technetwork/products/nosqldb/ downloads/index.html 2. Review Quick Start & Getting Started Guide 3. Review Programmatic API Guide 4. Start writing Java code
    19. 19. What is R? • R is an Open Source language and environment for statistical computing and graphics http://www.R-project.org/ • Started in 1994 as an Alternative to SAS, SPSS & Other proprietary Statistical Environments • The R environment – R is an integrated suite of software facilities for data manipulation, calculation and graphical display • Around 2 million R users worldwide – Widely taught in Universities – Many Corporate Analysts know and use R • Thousands of open sources R packages to enhance productivity such as: – Bioinformatics – Spatial Statistics – Financial Market Analysis
    20. 20. Why statisticians/data analysts use R? R environment is .. • Powerful • Extensible • Graphical • Extensive statistics • OOTB functionality with many ‘knobs’ but smart defaults • Ease of installation and use • Free
    21. 21. Limitations of R • R is a client and server bundled together as 1 executable – Single user tool, like Excel – Single-threaded – Cannot leverage multi-CPU capacity without use of special packages and coding • R requires data to be loaded into memory first – Loading data may not be a limitation given RAM available on laptops/desktops – R’s call by value semantics means that as data flows into functions, for each function invocation, a complete copy of the data is made – As a result you can quickly run into memory limits
    22. 22. Oracle R Connector for Hadoop • Provides transparent access to Hadoop Cluster, which consists of MapReduce and HDFS-resident data • R users not required to learn new language or interface to work with Hadoop • R users can execute jobs on a Hadoop cluster without requiring knowledge of Hadoop internals, Hadoop CLI, or IT infrastructure • Ability to leverage open source contributed R packages to work on HDFS-resident data
    23. 23. Oracle R Enterprise • Provides familiar R environment to operate on database- resident data • Overloads base R functions for scalable execution in Oracle Database – Automatically generates SQL from R and submits query to database – Leverages table parallelism where applicable • Enables embedded execution of R scripts at Oracle Database server – Provides database-controlled data-parallel execution framework – Enables leveraging CRAN open source R packages • Enables integration of structured results and graphics with OBIEE dashboards and BI Publisher documents
    24. 24. Oracle R Links • Blog: https://blogs.oracle.com/R/ • Forum: https://forums.oracle.com/forums/forum.jspa?forumID=1397 • Oracle R Distribution: http://www.oracle.com/technetwork/indexes/downloads/r-distribution-1532464.html • ROracle: http://cran.r-project.org/web/packages/ROracle • Oracle R Enterprise: http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise • Oracle R Connector for Hadoop: http://www.oracle.com/us/products/database/big-data-connectors/overview
    25. 25. Other Oracle Big Data Products Oracle Endeca Information Discovery http://www.oracle.com/us/solutions/business-analytics/business- intelligence/endeca/overview/index.html Oracle Data Integrator Application Adapter for Hadoop http://www.oracle.com/us/products/middleware/data- integration/hadoop/overview/index.html Oracle Loader for Hadoop http://www.oracle.com/technetwork/bdc/hadoop-loader/learnmore/index.html
    26. 26. The End The best way to predict the future is to create it! - Peter F. Drucker

    ×