• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introducing big data appliance x4 2 - for s cs
 

Introducing big data appliance x4 2 - for s cs

on

  • 221 views

 

Statistics

Views

Total Views
221
Views on SlideShare
221
Embed Views
0

Actions

Likes
1
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Pierwszawersjaw 2011 X2 potem X3 teraz X4
  • Cloudera Impala is the industry’s leading massively parallel processing (MPP) SQL query engine that runs natively in Apache Hadoop. The Apache-licensed, open source Impala project combines modern, scalable parallel database technology with the power of Hadoop, enabling users to directly query data stored in HDFS and Apache HBase without requiring data movement or transformationCloudera Search brings full-text, interactive search and scalable, flexible indexing to CDH and your enterprise data hub.HbaseHadoop database for random realtime read write accesstoBigData (non-relational, written in Java) runs on top of HDFSBDR – Backup and Disaster Recovery HDFS Replication, Hive ReplicationNavigator - fully integrated data management application for Apache Hadoop-based systems (Audit and Access Control, Lifecycle, Discovery, Data Quality). help answer questions such as - who has access to which data object(s), which data objects were accessed by a user, when was a data object accessed and by whom, what data assets were accessed using a service, which device was used to access, and so on Manager - Cloudera Manager is the industry’s first and most sophisticated management application for Apache Hadoop and the enterprise data hub
  • Start small and grow. Automatically optimize the configuration as you grow… moving services if required, updating config properties…
  • Start small and grow. Automatically optimize the configuration as you grow… moving services if required, updating config properties…
  • Start small and grow. Automatically optimize the configuration as you grow… moving services if required, updating config properties…
  • Start small and grow. Automatically optimize the configuration as you grow… moving services if required, updating config properties…
  • Start small and grow. Automatically optimize the configuration as you grow… moving services if required, updating config properties…
  • Oracle NoSQL Database indexes the data and supports transactionsOracle NoSQL Database has relaxed consistency rules, no schema structure, and only modest support for joinsOracle NoSQL Database has these characteristics:Uses a system-defined, consistent hash index for data distributionSupports high availability through replicationProvides single-record, single-operation transactions with relaxed consistency guaranteesProvides a Java API
  • Let’s take a look at a third common use case that’s related to real-time event processing: sensor data management.
  • Going shopping, used to be a very simple process. Perhaps some of you remember that time. A customer walked in, found what they wanted, paid for it and walked out. There was typically almost no interaction with the store and certainly no personalized experience. From the standpoint of the customer it was a very simple, limited set of steps.
  • From the viewpoint of thestore it was similar. They didn’t really interact with the customer at all, unless the customer asked for help. The only record of the customer ever visiting the store was in the form of a sales receipt. Most purchases consisted of a few items. There was no opportunity to recommend other products, customize the experience or learn about how the customer experienced their purchase.
  • With the web, everything changes. A customer’s actions can be captured. Navigation and other information can be presented for that customer, providing a personalized experience. Customers can record comments, suggestions, reviews, etc. Every customer visit becomes an opportunity to learn more about the customer and guide their shopping experience.
  • The end goal of shopping (at least from the viewpoint of the merchant) is to end up with a purchase. This major transaction involving an exchange of cash, a check or a credit card, has always been captured in the information systems of the retailer. But to get to that financial transaction, purchases over the web can include 100s of steps. And on the web, handling them correctly requires active participation from the retailer. You can’t just wait for the purchase any more. Tracking what they put in the shopping cart is just part of the process. You also have to do all these other things like capturing their comments, tracking their ratings or keeping lists up to date.Web sites can also capture other information relative to this customer – how long did they stay, where did they look, what did they compare, how hard was it to get to the product. All of this information informs the store about how they are doing and can be re-used the next time the customer visits to deliver a more personal experience.
  • And a personalized experience is important to the whole process. Here are a few of the common interactions that people just expect: Personalized greeting,Product recommendations (based on history, based on market segments, based on friends, based on product trends, etc.). Gives you product comments and ratings. Remembers your lists – birthdays, anniversaries, special events, personal “I wanna” lists, etc. Notifies/reminds you of upcoming events and past experiences. Remembers shipping and payment information –> make it simple for me. Everyone has experienced this in one form or another on the web. The more these applications know, the more they can personalize the shopping experience for me. Every web page that I visit encapsulates an opportunity to provide personalized content. At one level a personalized experience is now just part of being in the business. It’s hard to be competitive without it. But it’s also a potential differentiator which will make people come back to your site and spend their time or money with you, not a competitor.
  • All of this personalized interaction is based on a very simple basic concept: Each customer is represented by a rich customer profile with information specific to that customer, past history and recommendations for the future. These profiles are not static, they evolve over time. Capturing new types of information, new recommendations and continually adding new details that can be leveraged to further personalize the experience. That profile is enabled by two key capabilitiesLow latency simple data access to relevant data. A web page is a wealth of dynamic content, content that is generated on the fly by hundreds of individual queries. These queries need to be returned with ultra low latency because people will not wait for web pages. (There’s the story from Amazon where they stated that a 10ms delay in response time mapped to a 1% loss of revenue). Additionally, not every web page (aka query) needs all of the information from a given customer profile. It only needs the information that is relevant. The second attribute is scalability. Data only grows – catalogs, product information, customer profiles, historic data, customer ratings and comments, etc. Repositories need to be able to scale as the amount of data and processing increases. And while scaling, they must continue to deliver that low latency access.
  • Flume sink or MRTika for multiple document types
  • When Hadoop was initially designed - Doug Cutting was designing a scalable search engine - strong security was definitely not top of mind. But, clearly, the use cases around Hadoop have grown rapidly and there is critical data that is being stored in HDFS.Our sales plays: extend the DW - ETL offload - are also resonating with customers. It’s an entry into big data that makes sense - it’s a pain/opportunity that they currently feel.This really leads to an evolution from independent clusters to integrated data management systems. It’s a single logical system.Customers expect the same security and enterprise management from big data that they have with their Oracle Databases.Performance - the other bucket - that we have with every release.Oracle XQuery for Hadoop is a transformation engine for semi-structured big data. Oracle XQuery for Hadoop runs transformations expressed in the XQuery language by translating them into a series of MapReduce jobs, which are executed in parallel on the Apache Hadoop cluster. You can focus on data movement and transformation logic, instead of the complexities of Java and MapReduce, without sacrificing scalability or performance.The input data can be located in a file system accessible through the Hadoop File System API, including the Hadoop Distributed File System (HDFS), or stored in Oracle NoSQL Database. Oracle XQuery for Hadoop can write the transformation results to Hadoop files, Oracle NoSQL Database, or Oracle Database.OracleXQuery for Hadoop also provides extensions to Apache Hive to support massive XML files. These extensions are available only on Oracle Big Data Appliance.OracleXQuery for Hadoop is based on mature industry standards including XPath, XQuery, and XQuery Update Facility. It is fully integrated with other Oracle products, and so it:Loads data efficiently into Oracle Database using Oracle Loader for Hadoop.Provides read and write support to Oracle NoSQL Database.
  • When Hadoop was initially designed - Doug Cutting was designing a scalable search engine - strong security was definitely not top of mind. But, clearly, the use cases around Hadoop have grown rapidly and there is critical data that is being stored in HDFS.Our sales plays: extend the DW - ETL offload - are also resonating with customers. It’s an entry into big data that makes sense - it’s a pain/opportunity that they currently feel.This really leads to an evolution from independent clusters to integrated data management systems. It’s a single logical system.Customers expect the same security and enterprise management from big data that they have with their Oracle Databases.Performance - the other bucket - that we have with every release.
  • Authentication - supports Kerberos security as a software installation optionAuthorization - is at the HDFS file level, Apache Sentry integrates with the Hive and Impala SQL-query engines to provide fine-grained authorization to data and metadata stored in HadoopOn-Disk Encryption – Password-based encryption encodes Hadoop data based on a password, which is the same for all servers in a cluster TPM encryption encodes Hadoop data using the Trusted Platform Module (TPM) chip on the server motherboard
  • Audit Vault & DB FirewallIntegrated view - auditing the entire enterprise - BDA, Databases (DB2, SQL Server, Oracle), File System - are all monitored. AV provides rich integrated reporting that allows you to track this activity - and you can define alerts to trigger for different policy violations - e.g. 3 times a user attempted to access data that they should not have been attempting to access.Cloudera provides Navigator - which will provide a view of what is happening on the Hadoop side - but it does not provide an integrated, consolidated view across the enterprise.
  • Along with security - enterprise systems management across all engineered systems really resonates with customers.Stress consistency across engineered systems. You will see the same hardware monitoring here as you will with other engineered systems. It’s the same unless there is a need to deviate.
  • Enables Map-Reduce style R calculations with the Big Data Appliance and HDFSSupports compute-intensive parallelism for simulationsORCH provides optimized R algorithms that are robust, numerically accurate and linearly scalable on Hadoop and the Big Data Appliance. More cores achieve a proportional decrease in run times and matches R user experience.Added support for new models - bringing the list to the following:Linear Models and Logistic ModelsGeneral feed-forward Neural Networks Regression ModelsMatrix Factorization (algorithms for large-scale Matrix problems)K-Means ClusteringPCA (Principal Component Analysis)Correlations
  • Offering Notes :This offering is now available.This offering is a subscription for support, the license for NoSQL DB Community Edition is open source and is free (it is available via OTN). The price of the support offering is $2000 per year. This is an annual subscription, the user must purchase a new subscription each year on the Store.The offering does provide for Severity 1 SRs, per normal Oracle support policies.

Introducing big data appliance x4 2 - for s cs Introducing big data appliance x4 2 - for s cs Presentation Transcript