Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014.
Once insights and analysis have been produced within your Hadoop cluster by analysts and technical staff, it’s usually the case that you want to share the output with a wider audience in the organisation. Oracle Business Intelligence has connectivity to Hadoop through Apache Hive compatibility, and other Oracle tools such as Oracle Big Data Discovery and Big Data SQL can be used to visualise and publish Hadoop data. In this final session we’ll look at what’s involved in connecting these tools to your Hadoop environment, and also consider where data is optimally located when large amounts of Hadoop data need to be analysed alongside more traditional data warehouse datasets
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Mark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014
In this presentation we cover some key Hadoop concepts including HDFS, MapReduce, Hive and NoSQL/HBase, with the focus on Oracle Big Data Appliance and Cloudera Distribution including Hadoop. We explain how data is stored on a Hadoop system and the high-level ways it is accessed and analysed, and outline Oracle’s products in this area including the Big Data Connectors, Oracle Big Data SQL, and Oracle Business Intelligence (OBI) and Oracle Data Integrator (ODI).
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cMark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014.
There are many ways to ingest (load) data into a Hadoop cluster, from file copying using the Hadoop Filesystem (FS) shell through to real-time streaming using technologies such as Flume and Hadoop streaming. In this session we’ll take a high-level look at the data ingestion options for Hadoop, and then show how Oracle Data Integrator and Oracle GoldenGate leverage these technologies to load and process data within your Hadoop cluster. We’ll also consider the updated Oracle Information Management Reference Architecture and look at the best places to land and process your enterprise data, using Hadoop’s schema-on-read approach to hold low-value, low-density raw data, and then use the concept of a “data factory” to load and process your data into more traditional Oracle relational storage, where we hold high-density, high-value data.
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIMark Rittman
The document discusses Oracle's Big Data SQL, which brings Oracle SQL capabilities to Hadoop data stored in Hive tables. It allows querying Hive data using standard SQL from Oracle Database and viewing Hive metadata in Oracle data dictionary tables. Big Data SQL leverages the Hive metastore and uses direct reads and SmartScan to optimize queries against HDFS and Hive data. This provides a unified SQL interface and optimized query processing for both Oracle and Hadoop data.
ODI12c as your Big Data Integration HubMark Rittman
Presentation from the recent Oracle OTN Virtual Technology Summit, on using Oracle Data Integrator 12c to ingest, transform and process data on a Hadoop cluster.
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...Mark Rittman
This document discusses an end-to-end example of using Hadoop, OBIEE, ODI and Oracle Big Data Discovery to analyze big data from various sources. It describes ingesting website log data and Twitter data into a Hadoop cluster, processing and transforming the data using tools like Hive and Spark, and using the results for reporting in OBIEE and data discovery in Oracle Big Data Discovery. ODI is used to automate the data integration process.
Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13Mark Rittman
The latest releases of OBIEE and ODI come with the ability to connect to Hadoop data sources, using MapReduce to integrate data from clusters of "big data" servers complementing traditional BI data sources. In this presentation, we will look at how these two tools connect to Apache Hadoop and access "big data" sources, and share tips and tricks on making it all work smoothly.
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cMark Rittman
This document discusses using Hadoop and Hive for ETL work. It provides an overview of using Hadoop for distributed processing and storage of large datasets. It describes how Hive provides a SQL interface for querying data stored in Hadoop and how various Apache tools can be used to load, transform and store data in Hadoop. Examples of using Hive to view table metadata and run queries are also presented.
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Mark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014
In this presentation we cover some key Hadoop concepts including HDFS, MapReduce, Hive and NoSQL/HBase, with the focus on Oracle Big Data Appliance and Cloudera Distribution including Hadoop. We explain how data is stored on a Hadoop system and the high-level ways it is accessed and analysed, and outline Oracle’s products in this area including the Big Data Connectors, Oracle Big Data SQL, and Oracle Business Intelligence (OBI) and Oracle Data Integrator (ODI).
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cMark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014.
There are many ways to ingest (load) data into a Hadoop cluster, from file copying using the Hadoop Filesystem (FS) shell through to real-time streaming using technologies such as Flume and Hadoop streaming. In this session we’ll take a high-level look at the data ingestion options for Hadoop, and then show how Oracle Data Integrator and Oracle GoldenGate leverage these technologies to load and process data within your Hadoop cluster. We’ll also consider the updated Oracle Information Management Reference Architecture and look at the best places to land and process your enterprise data, using Hadoop’s schema-on-read approach to hold low-value, low-density raw data, and then use the concept of a “data factory” to load and process your data into more traditional Oracle relational storage, where we hold high-density, high-value data.
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIMark Rittman
The document discusses Oracle's Big Data SQL, which brings Oracle SQL capabilities to Hadoop data stored in Hive tables. It allows querying Hive data using standard SQL from Oracle Database and viewing Hive metadata in Oracle data dictionary tables. Big Data SQL leverages the Hive metastore and uses direct reads and SmartScan to optimize queries against HDFS and Hive data. This provides a unified SQL interface and optimized query processing for both Oracle and Hadoop data.
ODI12c as your Big Data Integration HubMark Rittman
Presentation from the recent Oracle OTN Virtual Technology Summit, on using Oracle Data Integrator 12c to ingest, transform and process data on a Hadoop cluster.
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...Mark Rittman
This document discusses an end-to-end example of using Hadoop, OBIEE, ODI and Oracle Big Data Discovery to analyze big data from various sources. It describes ingesting website log data and Twitter data into a Hadoop cluster, processing and transforming the data using tools like Hive and Spark, and using the results for reporting in OBIEE and data discovery in Oracle Big Data Discovery. ODI is used to automate the data integration process.
Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13Mark Rittman
The latest releases of OBIEE and ODI come with the ability to connect to Hadoop data sources, using MapReduce to integrate data from clusters of "big data" servers complementing traditional BI data sources. In this presentation, we will look at how these two tools connect to Apache Hadoop and access "big data" sources, and share tips and tricks on making it all work smoothly.
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cMark Rittman
This document discusses using Hadoop and Hive for ETL work. It provides an overview of using Hadoop for distributed processing and storage of large datasets. It describes how Hive provides a SQL interface for querying data stored in Hadoop and how various Apache tools can be used to load, transform and store data in Hadoop. Examples of using Hive to view table metadata and run queries are also presented.
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsMark Rittman
- The document discusses Oracle tools for extracting, transforming, and loading (ETL) big data from Hadoop into Oracle databases, including Oracle Data Integrator 12c, Oracle Loader for Hadoop, and Oracle Direct Connector for HDFS.
- It provides an overview of using Hadoop for ETL tasks like data loading, processing, and exporting data to structured databases, as well as tools like Hive, Pig, and Spark for these functions.
- Key benefits of the Oracle Hadoop connectors include pushing data transformations to Hadoop clusters for scale and leveraging SQL interfaces to access Hadoop data for business intelligence.
What is Big Data Discovery, and how it complements traditional business anal...Mark Rittman
Data Discovery is an analysis technique that complements traditional business analytics, and enables users to combine, explore and analyse disparate datasets to spot opportunities and patterns that lie hidden within your data. Oracle Big Data discovery takes this idea and applies it to your unstructured and big data datasets, giving users a way to catalogue, join and then analyse all types of data across your organization.
In this session we'll look at Oracle Big Data Discovery and how it provides a "visual face" to your big data initatives, and how it complements and extends the work that you currently do using business analytics tools.
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Mark Rittman
Presentation from the Rittman Mead BI Forum 2015 masterclass, pt.2 of a two-part session that also covered creating the Discovery Lab. Goes through setting up Flume log + twitter feeds into CDH5 Hadoop using ODI12c Advanced Big Data Option, then looks at the use of OBIEE11g with Hive, Impala and Big Data SQL before finally using Oracle Big Data Discovery for faceted search and data mashup on-top of Hadoop
Real-Time Data Replication to Hadoop using GoldenGate 12c AdaptorsMichael Rainey
Oracle GoldenGate 12c is well known for its highly performant data replication between relational databases. With the GoldenGate Adaptors, the tool can now apply the source transactions to a Big Data target, such as HDFS. In this session, we'll explore the different options for utilizing Oracle GoldenGate 12c to perform real-time data replication from a relational source database into HDFS. The GoldenGate Adaptors will be used to load movie data from the source to HDFS for use by Hive. Next, we'll take the demo a step further and publish the source transactions to a Flume agent, allowing Flume to handle the final load into the targets.
Presented at the Oracle Technology Network Virtual Technology Summit February/March 2015.
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Mark Rittman
This document summarizes a presentation about adding a Hadoop-based data reservoir to an Oracle data warehouse. The presentation discusses using a data reservoir to store large amounts of raw customer data from various sources to enable 360-degree customer analysis. It describes loading and integrating the data reservoir with the data warehouse using Oracle tools and how organizations can use it for more personalized customer marketing through advanced analytics and machine learning.
The document discusses the Lambda architecture, which combines batch and stream processing. It provides an example implementation using Hadoop, Kafka, Storm and other tools. The Lambda architecture handles batch loading and querying of large datasets as well as real-time processing of data streams. It also discusses using YARN and Spark for distributed processing and refreshing enrichments.
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
Enterprise Holding’s first started with Hadoop as a POC in 2013. Today, we have clusters on premises and in the cloud. This talk will explore our experience with Big Data and outline three common big data architectures (batch, lambda, and kappa). Then, we’ll dive into the decision points to necessary for your own cluster, for example: cloud vs on premises, physical vs virtual, workload, and security. These decisions will help you understand what direction to take. Finally, we’ll share some lessons learned with the pieces of our architecture worked well and rant about those which didn’t. No deep Hadoop knowledge is necessary, architect or executive level.
Unlock the value in your big data reservoir using oracle big data discovery a...Mark Rittman
The document discusses Oracle Big Data Discovery and how it can be used to analyze and gain insights from data stored in a Hadoop data reservoir. It provides an example scenario where Big Data Discovery is used to analyze website logs, tweets, and website posts and comments to understand popular content and influencers for a company. The data is ingested into the Big Data Discovery tool, which automatically enriches the data. Users can then explore the data, apply additional transformations, and visualize relationships to gain insights.
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingMichael Rainey
We produce quite a lot of data! Much of the data are business transactions stored in a relational database. More frequently, the data are non-structured, high volume and rapidly changing datasets known in the industry as Big Data. The challenge for data integration professionals is to combine and transform the data into useful information. Not just that, but it must also be done in near real-time and using a target system such as Hadoop. The topic of this session, real-time data streaming, provides a great solution for this challenging task. By integrating GoldenGate, Oracle’s premier data replication technology, and Apache Kafka, the latest open-source streaming and messaging system, we can implement a fast, durable, and scalable solution.
Presented at Oracle OpenWorld 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
Over the past several years, the Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems such as Impala and Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets.
Despite these advances, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing workloads.
This talk will investigate the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. It will also describe Kudu, the new addition to the open source Hadoop ecosystem that fills the gap described above, complementing HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API.
2015 nov 27_thug_paytm_rt_ingest_brief_finalAdam Muise
The document discusses Paytm Labs' transition from batch data ingestion to real-time data ingestion using Apache Kafka and Confluent. It outlines their current batch-driven pipeline and some of its limitations. Their new approach, called DFAI (Direct-From-App-Ingest), will have applications directly write data to Kafka using provided SDKs. This data will then be streamed and aggregated in real-time using their Fabrica framework to generate views for different use cases. The benefits of real-time ingestion include having fresher data available and a more flexible schema.
TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)Mark Rittman
Presentation from ODTUG KScope'14, Seattle, on using TimesTen as a standalone analytic database, and going beyond the use of the Exalytics Summary Advisor.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
The document discusses Talend's big data solutions and sandbox. It introduces Rajan Kanitkar as a senior solutions engineer at Talend with 15 years of experience in data integration. It then summarizes Talend's big data platform and ecosystem including Hadoop, MapReduce, HDFS, Hive and more. The rest of the document describes Talend's sandbox, which provides a pre-configured virtual image with Hadoop distributions, Talend software, and data scenarios to demonstrate ingesting, transforming and delivering big data.
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman
Demand for quicker access to multiple integrated sources of data continues to rise. Immediate access to data stored in a variety of systems - such as mainframes, data warehouses, and data marts - to mine visually for business intelligence is the competitive differentiation enterprises need to win in today’s economy.
Stop playing the waiting game and learn about a new end-to-end solution for combining, analyzing, and visualizing data from practically any source in your enterprise environment.
Leading organizations are already taking advantage of this architectural innovation to gain modern insights while reducing costs and propelling their businesses ahead of the competition.
Are you tired of waiting? Don't let your architecture hold you back. Access this webinar and hear from a team of industry experts on how you can Break the Barriers to Big Data Insight.
Flink in Zalando's world of Microservices ZalandoHayley
Apache Flink Meetup at Zalando Technology, May 2016
By Javier Lopez & Mihail Vieru, Zalando
In this talk we present Zalando's microservices architecture and introduce Saiki – our next generation data integration and distribution platform on AWS. We show why we chose Apache Flink to serve as our stream processing framework and describe how we employ it for our current use cases: business process monitoring and continuous ETL. We then have an outlook on future use cases.
Hadoop at the Center: The Next Generation of HadoopAdam Muise
This document discusses Hortonworks' approach to addressing challenges around managing large volumes of diverse data. It presents Hortonworks' Hadoop Data Platform (HDP) as a solution for consolidating siloed data into a central data lake on a single cluster. This allows different data types and workloads like batch, interactive, and real-time processing to leverage shared services for security, governance and operations while preserving existing tools. The HDP also enables new use cases for analytics like real-time personalization and segmentation using diverse data sources.
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016StampedeCon
Spark 2.0 includes many exciting new features including Structured Streaming, and the unification of Datasets (new in 1.6) with DataFrames. Structured Streaming allows one to define recurrent queries on a stream of data that is handled as an infinite DataFrame. This query is incrementally updated with new data. This allows for code reuse between batch and streaming and an easier logical model to reason about. Datasets, an extension of DataFrames, were added as an experimental feature in Spark 1.6. They allow us to manipulate collections of objects in a type-safe fashion. In Spark 2.0 the two abstractions have been unified and now DataFrame = Dataset[Row]. We will discuss both of these new features and look at practical real world examples.
This is a slide deck that was used for our 11/19/15 Nike Tech Talk to give a detailed overview of the SnappyData technology vision. The slides were presented by Jags Ramnarayan, Co-Founder & CTO of SnappyData
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Harald Erb
Talk held at DOAG 2016 conference (2016.doag.org/de/home) discussing a data lab concept incl.architecture blueprint, collaboration and tool examples based on Oracle solutions like Oracle Big Data Discovery (in combination with Jupyter Notebook)
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsMark Rittman
- The document discusses Oracle tools for extracting, transforming, and loading (ETL) big data from Hadoop into Oracle databases, including Oracle Data Integrator 12c, Oracle Loader for Hadoop, and Oracle Direct Connector for HDFS.
- It provides an overview of using Hadoop for ETL tasks like data loading, processing, and exporting data to structured databases, as well as tools like Hive, Pig, and Spark for these functions.
- Key benefits of the Oracle Hadoop connectors include pushing data transformations to Hadoop clusters for scale and leveraging SQL interfaces to access Hadoop data for business intelligence.
What is Big Data Discovery, and how it complements traditional business anal...Mark Rittman
Data Discovery is an analysis technique that complements traditional business analytics, and enables users to combine, explore and analyse disparate datasets to spot opportunities and patterns that lie hidden within your data. Oracle Big Data discovery takes this idea and applies it to your unstructured and big data datasets, giving users a way to catalogue, join and then analyse all types of data across your organization.
In this session we'll look at Oracle Big Data Discovery and how it provides a "visual face" to your big data initatives, and how it complements and extends the work that you currently do using business analytics tools.
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Mark Rittman
Presentation from the Rittman Mead BI Forum 2015 masterclass, pt.2 of a two-part session that also covered creating the Discovery Lab. Goes through setting up Flume log + twitter feeds into CDH5 Hadoop using ODI12c Advanced Big Data Option, then looks at the use of OBIEE11g with Hive, Impala and Big Data SQL before finally using Oracle Big Data Discovery for faceted search and data mashup on-top of Hadoop
Real-Time Data Replication to Hadoop using GoldenGate 12c AdaptorsMichael Rainey
Oracle GoldenGate 12c is well known for its highly performant data replication between relational databases. With the GoldenGate Adaptors, the tool can now apply the source transactions to a Big Data target, such as HDFS. In this session, we'll explore the different options for utilizing Oracle GoldenGate 12c to perform real-time data replication from a relational source database into HDFS. The GoldenGate Adaptors will be used to load movie data from the source to HDFS for use by Hive. Next, we'll take the demo a step further and publish the source transactions to a Flume agent, allowing Flume to handle the final load into the targets.
Presented at the Oracle Technology Network Virtual Technology Summit February/March 2015.
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Mark Rittman
This document summarizes a presentation about adding a Hadoop-based data reservoir to an Oracle data warehouse. The presentation discusses using a data reservoir to store large amounts of raw customer data from various sources to enable 360-degree customer analysis. It describes loading and integrating the data reservoir with the data warehouse using Oracle tools and how organizations can use it for more personalized customer marketing through advanced analytics and machine learning.
The document discusses the Lambda architecture, which combines batch and stream processing. It provides an example implementation using Hadoop, Kafka, Storm and other tools. The Lambda architecture handles batch loading and querying of large datasets as well as real-time processing of data streams. It also discusses using YARN and Spark for distributed processing and refreshing enrichments.
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
Enterprise Holding’s first started with Hadoop as a POC in 2013. Today, we have clusters on premises and in the cloud. This talk will explore our experience with Big Data and outline three common big data architectures (batch, lambda, and kappa). Then, we’ll dive into the decision points to necessary for your own cluster, for example: cloud vs on premises, physical vs virtual, workload, and security. These decisions will help you understand what direction to take. Finally, we’ll share some lessons learned with the pieces of our architecture worked well and rant about those which didn’t. No deep Hadoop knowledge is necessary, architect or executive level.
Unlock the value in your big data reservoir using oracle big data discovery a...Mark Rittman
The document discusses Oracle Big Data Discovery and how it can be used to analyze and gain insights from data stored in a Hadoop data reservoir. It provides an example scenario where Big Data Discovery is used to analyze website logs, tweets, and website posts and comments to understand popular content and influencers for a company. The data is ingested into the Big Data Discovery tool, which automatically enriches the data. Users can then explore the data, apply additional transformations, and visualize relationships to gain insights.
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingMichael Rainey
We produce quite a lot of data! Much of the data are business transactions stored in a relational database. More frequently, the data are non-structured, high volume and rapidly changing datasets known in the industry as Big Data. The challenge for data integration professionals is to combine and transform the data into useful information. Not just that, but it must also be done in near real-time and using a target system such as Hadoop. The topic of this session, real-time data streaming, provides a great solution for this challenging task. By integrating GoldenGate, Oracle’s premier data replication technology, and Apache Kafka, the latest open-source streaming and messaging system, we can implement a fast, durable, and scalable solution.
Presented at Oracle OpenWorld 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
Over the past several years, the Hadoop ecosystem has made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems such as Impala and Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems such as Apache HBase and Apache Phoenix, applications can achieve millisecond-scale random access to arbitrarily-sized datasets.
Despite these advances, some important gaps remain that prevent many applications from transitioning to Hadoop-based architectures. Users are often caught between a rock and a hard place: columnar formats such as Apache Parquet offer extremely fast scan rates for analytics, but little to no ability for real-time modification or row-by-row indexed access. Online systems such as HBase offer very fast random access, but scan rates that are too slow for large scale data warehousing workloads.
This talk will investigate the trade-offs between real-time transactional access and fast analytic performance from the perspective of storage engine internals. It will also describe Kudu, the new addition to the open source Hadoop ecosystem that fills the gap described above, complementing HDFS and HBase to provide a new option to achieve fast scans and fast random access from a single API.
2015 nov 27_thug_paytm_rt_ingest_brief_finalAdam Muise
The document discusses Paytm Labs' transition from batch data ingestion to real-time data ingestion using Apache Kafka and Confluent. It outlines their current batch-driven pipeline and some of its limitations. Their new approach, called DFAI (Direct-From-App-Ingest), will have applications directly write data to Kafka using provided SDKs. This data will then be streamed and aggregated in real-time using their Fabrica framework to generate views for different use cases. The benefits of real-time ingestion include having fresher data available and a more flexible schema.
TimesTen - Beyond the Summary Advisor (ODTUG KScope'14)Mark Rittman
Presentation from ODTUG KScope'14, Seattle, on using TimesTen as a standalone analytic database, and going beyond the use of the Exalytics Summary Advisor.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
The document discusses Talend's big data solutions and sandbox. It introduces Rajan Kanitkar as a senior solutions engineer at Talend with 15 years of experience in data integration. It then summarizes Talend's big data platform and ecosystem including Hadoop, MapReduce, HDFS, Hive and more. The rest of the document describes Talend's sandbox, which provides a pre-configured virtual image with Hadoop distributions, Talend software, and data scenarios to demonstrate ingesting, transforming and delivering big data.
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data InsightSteven Totman
Demand for quicker access to multiple integrated sources of data continues to rise. Immediate access to data stored in a variety of systems - such as mainframes, data warehouses, and data marts - to mine visually for business intelligence is the competitive differentiation enterprises need to win in today’s economy.
Stop playing the waiting game and learn about a new end-to-end solution for combining, analyzing, and visualizing data from practically any source in your enterprise environment.
Leading organizations are already taking advantage of this architectural innovation to gain modern insights while reducing costs and propelling their businesses ahead of the competition.
Are you tired of waiting? Don't let your architecture hold you back. Access this webinar and hear from a team of industry experts on how you can Break the Barriers to Big Data Insight.
Flink in Zalando's world of Microservices ZalandoHayley
Apache Flink Meetup at Zalando Technology, May 2016
By Javier Lopez & Mihail Vieru, Zalando
In this talk we present Zalando's microservices architecture and introduce Saiki – our next generation data integration and distribution platform on AWS. We show why we chose Apache Flink to serve as our stream processing framework and describe how we employ it for our current use cases: business process monitoring and continuous ETL. We then have an outlook on future use cases.
Hadoop at the Center: The Next Generation of HadoopAdam Muise
This document discusses Hortonworks' approach to addressing challenges around managing large volumes of diverse data. It presents Hortonworks' Hadoop Data Platform (HDP) as a solution for consolidating siloed data into a central data lake on a single cluster. This allows different data types and workloads like batch, interactive, and real-time processing to leverage shared services for security, governance and operations while preserving existing tools. The HDP also enables new use cases for analytics like real-time personalization and segmentation using diverse data sources.
What’s New in Spark 2.0: Structured Streaming and Datasets - StampedeCon 2016StampedeCon
Spark 2.0 includes many exciting new features including Structured Streaming, and the unification of Datasets (new in 1.6) with DataFrames. Structured Streaming allows one to define recurrent queries on a stream of data that is handled as an infinite DataFrame. This query is incrementally updated with new data. This allows for code reuse between batch and streaming and an easier logical model to reason about. Datasets, an extension of DataFrames, were added as an experimental feature in Spark 1.6. They allow us to manipulate collections of objects in a type-safe fashion. In Spark 2.0 the two abstractions have been unified and now DataFrame = Dataset[Row]. We will discuss both of these new features and look at practical real world examples.
This is a slide deck that was used for our 11/19/15 Nike Tech Talk to give a detailed overview of the SnappyData technology vision. The slides were presented by Jags Ramnarayan, Co-Founder & CTO of SnappyData
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Harald Erb
Talk held at DOAG 2016 conference (2016.doag.org/de/home) discussing a data lab concept incl.architecture blueprint, collaboration and tool examples based on Oracle solutions like Oracle Big Data Discovery (in combination with Jupyter Notebook)
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData
Apache Spark 2.0 offers many enhancements that make continuous analytics quite simple. In this talk, we will discuss many other things that you can do with your Apache Spark cluster. We explain how a deep integration of Apache Spark 2.0 and in-memory databases can bring you the best of both worlds! In particular, we discuss how to manage mutable data in Apache Spark, run consistent transactions at the same speed as state-the-art in-memory grids, build and use indexes for point lookups, and run 100x more analytics queries at in-memory speeds. No need to bridge multiple products or manage, tune multiple clusters. We explain how one can take regulation Apache Spark SQL OLAP workloads and speed them up by up to 20x using optimizations in SnappyData.
We then walk through several use-case examples, including IoT scenarios, where one has to ingest streams from many sources, cleanse it, manage the deluge by pre-aggregating and tracking metrics per minute, store all recent data in a in-memory store along with history in a data lake and permit interactive analytic queries at this constantly growing data. Rather than stitching together multiple clusters as proposed in Lambda, we walk through a design where everything is achieved in a single, horizontally scalable Apache Spark 2.0 cluster. A design that is simpler, a lot more efficient, and let’s you do everything from Machine Learning and Data Science to Transactions and Visual Analytics all in one single cluster.
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
I dati sono il nuovo Capitale: come il capitale finanziario, sono una risorsa che deve essere gestita, raccolta e tenuta al sicuro, ma deve essere anche investita dalle organizzazioni che vogliono ottenere vantaggio competitivo. I dati non sono una risorsa nuova, ma soltanto oggi per la prima volta sono disponbili in abbondanza assieme alle tecnologie necessarie per massimizzarne il ritorno. Esattamente come l'elettricità fu una curiosità da laboratorio per molto tempo, finché non venne resa disponibile alle masse e dunque cambiò totalmente il volto dell'industria moderna.Ecco perché per accelerare il cambiamento è necessario un approccio innovativo alla esecuzione delle iniziative orientate ai Big Data: un laboratorio analitico come catalizzatore dell'innovazione (Data Lab).In questo webinar sulle tecnologie Oracle, utilizzeremo il consueto approccio del racconto basato su casi d’uso ed esperienze concrete.
Creating truly personal omni-channel customer experiences by Brian Solis and ...Brian Solis
An exclusive ebook written by Brian Solis for SmartFocus. Customers are more connected and more informed than ever. Digital marketers now need an entirely fresh perspective to succeed in a world where customers and prospects experience their brand in multiple ways – online ads, websites, blogs, email, social and more. In retail, the customer journey might also include a visit to a real world store. This eBook, with exclusive video insights from Brian Solis, will explain how to build those journeys and develop an omni-channel marketing strategy by covering topics such as:
What is omni-channel marketing and why is it important?
How to be human and stay tech savvy and the importance of social media
How email marketing is more important than ever
Omni-Channel Retailing: The Future Trend in Fashion and Luxury Industry - Par...Fashionbi
This publication offers you insights into:
-The most important findings from various surveys that were recently conducted on the changing behauviors of consumer
-Best Omni-Channel marketing practices of more than 60 famous brands and retailers.
-More than 25 successful future retail trends and opportunities in Fashion
Ougn2013 high speed, in-memory big data analysis with oracle exalyticsMark Rittman
The document discusses Oracle Exalytics, a platform for high speed, big data analysis. Exalytics combines Oracle Business Intelligence software with specialized hardware to enable high-density visualization of large datasets and support of many concurrent users. It also integrates Oracle Essbase, Endeca, and Hadoop to provide additional analytic capabilities for both structured and unstructured data.
Deploying OBIEE in the Cloud - Oracle Openworld 2014Mark Rittman
Introduction to Oracle BI Cloud Service (BICS) including administration, data upload, creating the repository and creating dashboards and reports. Also includes a short case-study around Salesforce.com reporting created for the BICS beta program.
Presentation from the Rittman Mead BI Forum 2013 on ODI11g's Hadoop connectivity. Provides a background to Hadoop, HDFS and Hive, and talks about how ODI11g, and OBIEE 11.1.1.7+, uses Hive to connect to "big data" sources.
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
This document discusses getting started with big data analytics using Hadoop and Pentaho. It provides an overview of installing and configuring Hadoop and Pentaho on a single machine or cluster. Dell's Crowbar tool is presented as a way to quickly deploy Hadoop clusters on Dell hardware in about two hours. The document also covers best practices like leveraging different technologies, starting with small datasets, and not overloading networks. A demo is given and contact information provided.
Presentation by Mark Rittman, Technical Director, Rittman Mead, on ODI 11g features that support enterprise deployment and usage. Delivered at BIWA Summit 2013, January 2013.
UKOUG Tech 15 - Migration from Oracle Warehouse Builder to Oracle Data Integr...Jérôme Françoisse
This document discusses migrating from Oracle Warehouse Builder (OWB) to Oracle Data Integrator (ODI). It describes the similarities and differences between OWB and ODI, the migration utility for converting OWB objects to ODI, the steps for performing a migration, and issues that may be encountered such as mappings that do not migrate or require changes. It also covers OWB and ODI architecture and using the ODI scheduler and monitoring tools.
Using Endeca with Oracle Exalytics - Oracle France BI Customer Event, October...Mark Rittman
Short presentation at the Oracle France BI Customer event in Paris, October 2013, on the advantage of running Endeca Information Discovery on Oracle Exalytics In-Memory Machine.
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
This is from the talk I gave at the 30th Anniversary NoCOUG meeting in San Jose, CA.
We all know that data warehouses and best practices for them are changing dramatically today. As organizations build new data warehouses and modernize established ones, they are turning to Data Warehousing as a Service (DWaaS) in hopes of taking advantage of the performance, concurrency, simplicity, and lower cost of a SaaS solution or simply to reduce their data center footprint (and the maintenance that goes with that).
But what is a DWaaS really? How is it different from traditional on-premises data warehousing?
In this talk I will:
• Demystify DWaaS by defining it and its goals
• Discuss the real-world benefits of DWaaS
• Discuss some of the coolest features in a DWaaS solution as exemplified by the Snowflake Elastic Data Warehouse.
The document compares various database companies and their products. It provides details on TmaxSoft, including its employee headcount, main products which include databases and middleware, and key technologies. It then discusses Tibero database specifically and how it compares to Oracle, including features, licensing model, compatibility, and security. Examples are also provided showing capabilities of Tibero compared to Oracle.
2-in-1 : RPD Magic and Hyperion Planning "Adapter"Gianni Ceresa
The best way to explain the powerful modeling capabilities of OBIEE is with a real-use case, and for this session it will be the relational database of Hyperion Planning applications, not a model designed for reporting but for application needs. Use the advanced modeling capabilities of OBIEE to transform the relational schema of Hyperion Planning with what we call some "RPD magic." The target is to fill the gap when reporting against Planning, using only Essbase and missing the content of the relational database, to end with a federation of Essbase and the relational source without ETL, just with OBIEE and some magic.
As 11.1.1.9 with a native Planning driver was released just few weeks before the presentation it was changed to cover this new driver and then completed with some proper RPD magic to still keep the dual content.
The document discusses big data and Oracle technologies. It provides an overview of big data, describing what it is and examples of big data in different industries. It then discusses several Oracle technologies for working with big data, including Oracle NoSQL Database for scalable key-value storage, Oracle R for statistical analysis and connecting to Hadoop, and Oracle Endeca for information discovery.
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015Mark Rittman
Slides from a two-day OBIEE11g seminar in Dubai, February 2015, at the Oracle University Expert Summit. Covers the following topics:
1. OBIEE 11g Overview & New Features
2. Adding Exalytics and In-Memory Analytics to OBIEE 11g
3. Source Control and Concurrent Development for OBIEE
4. No Silver Bullets - OBIEE 11g Performance in the Real World
5. Oracle BI Cloud Service Overview, Tips and Techniques
6. Moving to Oracle BI Applications 11g + ODI
7. Oracle Essbase and Oracle BI EE 11g Integration Tips and Techniques
8. OBIEE 11g and Predictive Analytics, Hadoop & Big Data
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...Spark Summit
Legacy enterprise data warehouse (EDW) architecture, geared toward day-to-day workloads associated with operational querying, reporting, and analytics, are often ill-equipped to handle the volume of data, traffic, and varied data types associated with a modern, ad-hoc analytics platform. Faced with challenges of increasing pipeline speed, aggregation, and visualization in a simplified, self-service fashion, organizations are increasingly turning to some combination of Spark, Hadoop, Kafka, and proven analytical databases like Vertica as key enabling technologies to optimize their EDW architecture. Join us to learn how successful organizations have developed real-time streaming solutions with these technologies for range of use cases, including IOT predictive maintenance.
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
As presented at OGh SQL Celebration Day in June 2016, NL. Covers new features in Big Data SQL including storage indexes, storage handlers and ability to install + license on commodity hardware
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...jdijcks
Learn about the benefits of Oracle Big Data Appliance and how it can drive business value underneath applications and tools. This includes a section by Paul Kent, VP Big Data SAS describing how SAS runs well on Oracle Engineered Systems and on Oracle Big Data Appliance specifically.
Snowflake's Kent Graziano talks about what makes a data warehouse as a service and some of the key features of Snowflake's data warehouse as a service.
SQL Server 2016 introduces new features for business intelligence and reporting. PolyBase allows querying data across SQL Server and Hadoop using T-SQL. Integration Services has improved support for AlwaysOn availability groups and incremental package deployment. Reporting Services adds HTML5 rendering, PowerPoint export, and the ability to pin report items to Power BI dashboards. Mobile Report Publisher enables developing and publishing mobile reports.
Unlock Hadoop Success with Cloudera Navigator OptimizerCloudera, Inc.
Cloudera Navigator Optimizer analyzes existing SQL workloads to provide instant insights into your workloads and turns that into an intelligent optimization strategy so you can unlock peak performance and efficiency with Hadoop.
This document discusses using Hadoop to offload processing from traditional data warehouses. It describes how big data challenges like volume, variety and velocity are straining data warehouses. Hadoop can help by using ELT (extract, load, transform) rather than ETL to handle semi-structured data and apply schemas during loading. This allows data warehouses to query data in Hadoop and achieve cost savings of 50x-100x over storing all data in a data warehouse. The document also provides an example of a telecom company that saved $30 million over 5 years by replacing ETL steps with Hadoop to preprocess billing records.
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
This document discusses considerations for scaling Hadoop platforms at Yahoo. It covers topics such as deployment models (on-premise vs. public cloud), total cost of ownership, hardware configuration, networking, software stack, security, data lifecycle management, metering and governance, and debunking myths. The key takeaways are that utilization matters for cost analysis, hardware becomes increasingly heterogeneous over time, advanced networking designs are needed to avoid bottlenecks, security and access management must be flexible, and data lifecycles require policy-based management.
Similar to Part 4 - Hadoop Data Output and Reporting using OBIEE11g (20)
The Future of Analytics, Data Integration and BI on Big Data PlatformsMark Rittman
The document discusses the future of analytics, data integration, and business intelligence (BI) on big data platforms like Hadoop. It covers how BI has evolved from old-school data warehousing to enterprise BI tools to utilizing big data platforms. New technologies like Impala, Kudu, and dataflow pipelines have made Hadoop fast and suitable for analytics. Machine learning can be used for automatic schema discovery. Emerging open-source BI tools and platforms, along with notebooks, bring new approaches to BI. Hadoop has become the default platform and future for analytics.
Using Oracle Big Data Discovey as a Data Scientist's ToolkitMark Rittman
As delivered at Trivadis Tech Event 2016 - how Big Data Discovery along with Python and pySpark was used to build predictive analytics models against wearables and smart home data
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we’ll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete “data fabric” solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Mark Rittman
The document discusses using Hadoop and NoSQL technologies like Apache HBase to perform social network analysis on Twitter data related to a company's website and blog. It describes ingesting tweet and website log data into Hadoop HDFS and processing it with tools like Hive. Graph algorithms from Oracle Big Data Spatial & Graph were then used on the property graph stored in HBase to identify influential Twitter users and communities. This approach provided real-time insights at scale compared to using a traditional relational database.
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
Mark Rittman, CTO of Rittman Mead, gave a keynote presentation on big data for Oracle developers and DBAs with a focus on Apache Spark, real-time analytics, and predictive analytics. He discussed how Hadoop can provide flexible, cheap storage for logs, feeds, and social data. He also explained several Hadoop processing frameworks like Apache Spark, Apache Tez, Cloudera Impala, and Apache Drill that provide faster alternatives to traditional MapReduce processing.
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
Mark Rittman from Rittman Mead presented on Oracle Big Data Discovery. He discussed how many organizations are running big data initiatives involving loading large amounts of raw data into data lakes for analysis. Oracle Big Data Discovery provides a visual interface for exploring, analyzing, and transforming this raw data. It allows users to understand relationships in the data, perform enrichments, and prepare the data for use in tools like Oracle Business Intelligence.
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
Mark Rittman gave a presentation on the future of analytics on Oracle Big Data Appliance. He discussed how Hadoop has enabled highly scalable and affordable cluster computing using technologies like MapReduce, Hive, Impala, and Parquet. Rittman also talked about how these technologies have improved query performance and made Hadoop suitable for both batch and interactive/ad-hoc querying of large datasets.
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
Hadoop and NoSQL platforms initially focused on Java developers and slow but massively-scalable MapReduce jobs as an alternative to high-end but limited-scale analytics RDBMS engines. Apache Hive opened-up Hadoop to non-programmers by adding a SQL query engine and relational-style metadata layered over raw HDFS storage, and since then open-source initiatives such as Hive Stinger, Cloudera Impala and Apache Drill along with proprietary solutions from closed-source vendors have extended SQL-on-Hadoop’s capabilities into areas such as low-latency ad-hoc queries, ACID-compliant transactions and schema-less data discovery – at massive scale and with compelling economics.
In this session we’ll focus on technical foundations around SQL-on-Hadoop, first reviewing the basic platform Apache Hive provides and then looking in more detail at how ad-hoc querying, ACID-compliant transactions and data discovery engines work along with more specialised underlying storage that each now work best with – and we’ll take a look to the future to see how SQL querying, data integration and analytics are likely to come together in the next five years to make Hadoop the default platform running mixed old-world/new-world analytics workloads.
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
Mark Rittman, founder of Rittman Mead, discusses Oracle's approach to hybrid BI deployments and how it aligns with Gartner's vision of a modern BI platform. He explains how Oracle BI 12c supports both traditional top-down modeling and bottom-up data discovery. It also enables deploying components on-premises or in the cloud for flexibility. Rittman believes the future is bi-modal, with IT enabling self-service analytics alongside centralized governance.
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...Mark Rittman
This talk focus is on what a data reservoir is, how it related to the RDBMS DW, and how Big Data Discovery provides access to it to business and BI users
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsMark Rittman
This is a session for Oracle DBAs and devs that looks at the cutting edge big data techs like Spark, Kafka etc, and through demos shows how Hadoop is now a a real-time platform for fast analytics, data integration and predictive modeling
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...Mark Rittman
OBIEE12c comes with an updated version of Essbase that focuses entirely in this release on the query acceleration use-case. This presentation looks at this new release and explains how the new BI Accelerator Wizard manages the creation of Essbase cubes to accelerate OBIEE query performance
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Mark Rittman
- Mark Rittman presented on deploying full OBIEE systems to Oracle Cloud. This involves migrating the data warehouse to Oracle Database Cloud Service, updating the RPD to connect to the cloud database, and uploading the RPD to Oracle BI Cloud Service. Using the wider Oracle PaaS ecosystem allows hosting a full BI platform in the cloud.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
What is Master Data Management by PiLog Groupaymanquadri279
PiLog Group's Master Data Record Manager (MDRM) is a sophisticated enterprise solution designed to ensure data accuracy, consistency, and governance across various business functions. MDRM integrates advanced data management technologies to cleanse, classify, and standardize master data, thereby enhancing data quality and operational efficiency.
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Fundamentals of Programming and Language Processors
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
1. Lesson 4 : Hadoop Data Output and
Reporting using OBIEE 11g
Mark Rittman, CTO, Rittman Mead
SIOUG and HROUG Conferences, Oct 2014
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
2. Moving Data In, Around and Out of Hadoop
•Three stages to Hadoop data movement, with dedicated Apache / other tools
‣Load : receive files in batch, or in real-time (logs, events)
‣Transform : process & transform data to answer questions
‣Store / Export : store in structured form, or export to RDBMS using Sqoop
RDBMS
Imports
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
Loading
Stage
!!!!
Processing
Stage
E : info@rittmanmead.com
W : www.rittmanmead.com
!!!!
Store / Export
Stage
!!!!
Real-Time
Logs / Events
File /
Unstructured
Imports
File
Exports
RDBMS
Exports
3. Discovery vs. Exploitation Project Phases
•Discovery and monetising steps in Big Data projects have different requirements
•Discovery phase
‣Unbounded discovery
‣Self-Service sandbox
‣Wide toolset
•Promotion to Exploitation
‣Commercial exploitation
‣Narrower toolset
‣Integration to operations
‣Non-functional requirements
‣Code standardisation & governance
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
4. Actionable
Events
Event Engine Enterprise
www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Information Store
Reporting
Discovery Lab
Actionable
Information
Actionable
Insights
Input
Events
Execution
Innovation
Discovery
Output
Events
& Data
Structured
Enterprise
Data
Other
Data
Data
Reservoir
Data Factory
Information Solution
5. Actionable
Events
Event Engine Enterprise
www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Information Store
Reporting
Discovery Lab
Actionable
Information
Actionable
Insights
Input
Events
Execution
Innovation
Discovery
Output
Events
& Data
Data
Reservoir
Data Factory
Information Solution
6. Design Pattern : Information Solution
•Specific solution based on Big Data technologies requiring broader
integration to the wider Information Management estate
e.g. ETL pre-processor for the DW or affordably store a lower level of
grain
•Non-functional requirements more critical in this solution
•Scalable integration to IM estate
an important factor for success
•Analysis may take place in Reservoir
or Reservoir only used as an aggregator
www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead www.facebook.com/rittmanmead
7. Options for Sharing Hadoop Output with Wider Audience
•During the discovery phase of a Hadoop project, audience are likely technical
‣Most comfortable with data analyst tools, command-line, low-level access to the data
•During the exploitation phase, audience will be less technical
‣Emphasis on graphical tools, and integration with wider reporting toolset + metadata
•Three main options for visualising and sharing Hadoop data
1.Coming Soon - Oracle Big Data Discovery (Endeca on Hadoop)
2.OBIEE reporting against Hadoop
direct using Hive/Impala, or Oracle Big Data SQL
3.OBIEE reporting against an export of the
Hadoop data, on Exalytics / RDBMS
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
8. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle Big Data Discovery
•Launched at Oracle Openworld 2014 as “The Visual Face of Hadoop”
•Combined Endeca Server search + analytical technology with Spark data transformation
9. Interactive Analysis & Exploration of Hadoop Data
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
10. Share and Collaborate on Big Data Discovery Projects
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
11. Oracle Business Analytics and Big Data Sources
•OBIEE 11g can also make use of big data sources
‣OBIEE 11.1.1.7+ supports Hive/Hadoop as a data source
‣Oracle R Enterprise can expose R models through DB functions, columns
‣Oracle Exalytics has InfiniBand connectivity to Oracle BDA
•Endeca Information Discovery can analyze unstructured and semi-structured sources
‣Increasingly tighter-integration between
OBIEE and Endeca
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
12. New in OBIEE 11.1.1.7 : Hadoop Connectivity through Hive
•MapReduce jobs are typically written in Java, but Hive can make this simpler
•Hive is a query environment over Hadoop/MapReduce to support SQL-like queries
•Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automatically
creates MapReduce jobs against data previously loaded into the Hive HDFS tables
•Approach used by ODI and OBIEE to gain access to Hadoop data
•Allows Hadoop data to be accessed just like any other data source
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
13. Importing Hadoop/Hive Metadata into RPD
•HiveODBC driver has to be installed into Windows environment, so that
BI Administration tool can connect to Hive and return table metadata
•Import as ODBC datasource, change physical DB type to Apache Hadoop afterwards
•Note that OBIEE queries cannot span >1 Hive schema (no table prefixes)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
1
2
3
14. Set up ODBC Connection at the OBIEE Server
•OBIEE 11.1.1.7+ ships with HiveODBC drivers, need to use 7.x versions though (only Linux
supported)
•Configure the ODBC connection in odbc.ini, name needs to match RPD ODBC name
•BI Server should then be able to connect to the Hive server, and Hadoop/MapReduce
[ODBC Data Sources]
AnalyticsWeb=Oracle BI Server
Cluster=Oracle BI Server
SSL_Sample=Oracle BI Server
bigdatalite=Oracle 7.1 Apache Hive Wire Protocol
[bigdatalite]
Driver=/u01/app/Middleware/Oracle_BI1/common/ODBC/
Merant/7.0.1/lib/ARhive27.so
Description=Oracle 7.1 Apache Hive Wire Protocol
ArraySize=16384
Database=default
DefaultLongDataBuffLen=1024
EnableLongDataBuffLen=1024
EnableDescribeParam=0
Hostname=bigdatalite
LoginTimeout=30
MaxVarcharSize=2000
PortNumber=10000
RemoveColumnQualifiers=0
StringDescribeType=12
TransactionMode=0
UseCurrentSchema=0
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
15. Dealing with Hadoop / Hive Latency Option 1 : Impala
•Hadoop access through Hive can be slow - due to inherent latency in Hive
•Hive queries use MapReduce in the background to query Hadoop
•Spins-up Java VM on each query
•Generates MapReduce job
•Runs and collates the answer
•Great for large, distributed queries ...
•... but not so good for “speed-of-thought” dashboards
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
16. Dealing with Hadoop / Hive Latency Option 1 : Use Impala
•Hive is slow - because it’s meant to be used for batch-mode queries
•Many companies / projects are trying to improve Hive - one of which is Cloudera
•Cloudera Impala is an open-source but
commercially-sponsored in-memory MPP platform
•Replaces Hive and MapReduce in the Hadoop stack
•Can we use this, instead of Hive, to access Hadoop?
‣It will need to work with OBIEE
‣Warning - it won’t be a supported data source (yet…)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
17. Demo
Using Hive to Provide Data for OBIEE11g
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
18. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
How Impala Works
•A replacement for Hive, but uses Hive concepts and
data dictionary (metastore)
•MPP (Massively Parallel Processing) query engine
that runs within Hadoop
‣Uses same file formats, security,
resource management as Hadoop
•Processes queries in-memory
•Accesses standard HDFS file data
•Option to use Apache AVRO, RCFile,
LZO or Parquet (column-store)
•Designed for interactive, real-time
SQL-like access to Hadoop
BI Server
Presentation Svr
Impala
Hadoop
HDFS etc
Cloudera Impala
ODBC Driver
Impala
Hadoop
HDFS etc
Impala
Hadoop
HDFS etc
Impala
Hadoop
HDFS etc
Impala
Hadoop
HDFS etc
Multi-Node
Hadoop Cluster
19. Connecting OBIEE 11.1.1.7 to Cloudera Impala
•Warning - unsupported source - limited testing and no support from MOS
•Requires Cloudera Impala ODBC drivers - Windows or Linux (RHEL etc/SLES) - 32/64 bit
•ODBC Driver / DSN connection steps similar to Hive
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
20. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Importing Impala Metadata
•Import Impala tables (via the Hive metastore) into RPD
•Set database type to “Apache Hadoop”
‣Warning - don’t set ODBC type to Hadoop- leave at ODBC 2.0
‣Create physical layer keys, joins etc as normal
21. Importing RPD using Impala Metadata
•Create BMM layer, Presentation layer as normal
•Use “View Rows” feature to check connectivity back to Impala / Hadoop
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
22. Impala / OBIEE Issue with ORDER BY Clause
•Although checking rows in the BI Administration tool worked, any query that aggregates
data in the dashboard will fail
•Issue is that Impala requires LIMIT with all ORDER BY clauses
‣OBIEE could use LIMIT, but doesn’t for Impala
at the moment (because not supported)
•Workaround - disable ORDER BY in
Database Features, have the BI Server do sorting
‣Not ideal - but it works, until Impala supported
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
23. Demo
Using Impala to Provide Data for OBIEE11g
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
24. So Does Impala Work, as a Hive Substitute?
•With ORDER BY disabled in DB features, it appears to
•But not extensively tested by me, or Oracle
•But it’s certainly interesting
•Reduces 30s, 180s queries down to 1s, 10s etc
•Impala, or one of the competitor projects
(Drill, Dremel etc) assumed to be the
real-time query replacement for Hive, in time
‣Oracle announced planned support for
Impala at OOW2013 - watch this space
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
25. Dealing with Hadoop / Hive Latency Option 2 : Big Data SQL
•Preferred solution for customers with Oracle Big Data Appliance is Big Data SQL
•Oracle SQL Access to both relational, and Hive/NoSQL data sources
•Exadata-type SmartScan against Hadoop datasets
•Response-time equivalent to Impala or Hive on Tez
•No issues around HiveQL limitations
•Insulates end-users around differences
between Oracle and Hive datasets
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
26. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle Big Data SQL
•Part of Oracle Big Data 4.0 (BDA-only)
‣Also requires Oracle Database 12c, Oracle Exadata Database Machine
•Extends Oracle Data Dictionary to cover Hive
•Extends Oracle SQL and SmartScan to Hadoop
•Extends Oracle Security Model over Hadoop
‣Fine-grained access control
‣Data redaction, data masking
‣Uses fast c-based readers where possible
(vs. Hive MapReduce generation)
‣Map Hadoop parallelism to Oracle PQ
‣Big Data SQL engine works on top of YARN
Exadata
Storage Servers
‣Like Spark, Tez, MR2
Exadata Database
Server
Hadoop
Cluster
Oracle Big
Data SQL
SQL Queries
SmartScan SmartScan
27. Big Data SQL Server Dataflow
•Read data from HDFS Data Node
‣Direct-path reads
‣C-based readers when possible
‣Use native Hadoop classes otherwise
•Translate bytes to Oracle
•Apply SmartScan to Oracle bytes
‣Apply filters
‣Project columns
‣Parse JSON/XML
‣Score models Disks%
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Big$Data$SQL$Server$
Smart$Scan$
External$Table$Services$
SerDe%
RecordReader%
Data$Node$
10110010% 10110010% 10110010%
3%
2%
1%
1
2
3
28. View Hive Table Metadata in the Oracle Data Dictionary
•Oracle Database 12c 12.1.0.2.0 with Big Data SQL option can view Hive table metadata
‣Linked by Exadata configuration steps to one or more BDA clusters
•DBA_HIVE_TABLES and USER_HIVE_TABLES exposes Hive metadata
•Oracle SQL*Developer 4.0.3, with Cloudera Hive drivers, can connect to Hive metastore
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
SQL> col database_name for a30
SQL> col table_name for a30
SQL> select database_name, table_name
2 from dba_hive_tables;
!
DATABASE_NAME TABLE_NAME
------------------------------ ------------------------------
default access_per_post
default access_per_post_categories
default access_per_post_full
default apachelog
default categories
default countries
default cust
default hive_raw_apache_access_log
29. Hive Access through Oracle External Tables + Hive Driver
•Big Data SQL accesses Hive tables through external table mechanism
‣ORACLE_HIVE external table type imports Hive metastore metadata
‣ORACLE_HDFS requires metadata to be specified
•Access parameters cluster and tablename specify Hive table source and BDA cluster
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
CREATE TABLE access_per_post_categories(
hostname varchar2(100),
request_date varchar2(100),
post_id varchar2(10),
title varchar2(200),
author varchar2(100),
category varchar2(100),
ip_integer number)
organization external
(type oracle_hive
default directory default_dir
access parameters(com.oracle.bigdata.tablename=default.access_per_post_categories));
30. Demo
Using Big Data SQL to Provide Data for OBIEE11g
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
31. Alternative to Direct Against Hadoop : Export to Data Mart
•In most cases, for general reporting access, exporting into RDBMS makes sense
•Export Hive data from Hadoop into Oracle Data Mart or Data Warehouse
•Use Oracle RDBMS for high-value data analysis, full access to RBDMS optimisations
•Potentially use Exalytics for in-memory RBDMS access
RDBMS
Imports
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
Loading
Stage
!!!!
Processing
Stage
E : info@rittmanmead.com
W : www.rittmanmead.com
!!!!
Store / Export
Stage
!!!!
Real-Time
Logs / Events
File /
Unstructured
Imports
File
Exports
RDBMS
Exports
32. Using the Right Server for the Right Job
•Hadoop for large scale, high-speed data ingestion and processing
•Oracle RDBMS and Exadata for long-term storage of high-value data
•Oracle Exalytics for speed-of-though analytics in TimesTen and Oracle Essbase
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
33. Bulk Unload Summary Data to Oracle Database
•Final requirement is to unload final Hive table contents to Oracle Database
•Several use-cases for this:
•Use Hadoop / BDA for ETL offloading
•Use analysis capabilities of BDA, but then output results to RDBMS data mart or DW
•Permit use of more advanced SQL query tools
•Share results with other applications
•Can use Sqoop for this, or use Oracle Big Data Connectors
•Fast bulk unload, or transparent Oracle access to Hive
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
5
34. Oracle Direct Connector for HDFS
•Enables HDFS as a data-source for Oracle Database external tables
•Effectively provides Oracle SQL access over HDFS
•Supports data query, or import into Oracle DB
•Treat HDFS-stored files in the same way as regular files
•But with HDFS’s low-cost
•… and fault-tolerance
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
35. Oracle Loader for Hadoop (OLH)
•Oracle technology for accessing Hadoop data, and loading it into an Oracle database
•Pushes data transformation, “heavy lifting” to the Hadoop cluster, using MapReduce
•Direct-path loads into Oracle Database, partitioned and non-partitioned
•Online and offline loads
•Load from HDFS or Hive tables
•Key technology for fast load of
Hadoop results into Oracle DB
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
36. IKM File/Hive to Oracle (OLH/ODCH)
•KM for accessing HDFS/Hive data from Oracle
•Either sets up ODCH connectivity, or bulk-unloads via OLH
•Map from HDFS or Hive source to Oracle tables (via Oracle technology in Topology)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
37. Environment Variable Requirements
•Hardest part in setting up OLH / IKM File/Hive to Oracle is getting environment variables
correct - OLH needs to be able to see correct JARs, configuration files
•Set in /home/oracle/.bashrc - see example below
export HIVE_HOME=/usr/lib/hive
export HADOOP_CLASSPATH=/home/oracle/oracle/product/oraloader-3.0.0-h2/jlib/*:/etc/hive/conf:$HIVE_HOME/lib/
hive-metastore-0.12.0-cdh5.0.1.jar:$HIVE_HOME/lib/libthrift.jar:$HIVE_HOME/lib/libfb303-0.9.0.jar:$HIVE_HOME/
lib/hive-common-0.12.0-cdh5.0.1.jar:$HIVE_HOME/lib/hive-exec-0.12.0-cdh5.0.1.jar
export OLH_HOME=/home/oracle/oracle/product/oraloader-3.0.0-h2
export HADOOP_HOME=/usr/lib/hadoop
export JAVA_HOME=/usr/java/jdk1.7.0_60
export ODI_HIVE_SESSION_JARS=/usr/lib/hive/lib/hive-contrib.jar
export ODI_OLH_JARS=/home/oracle/oracle/product/oraloader-3.0.0-h2/jlib/ojdbc6.jar,/home/oracle/oracle/
product/oraloader-3.0.0-h2/jlib/orai18n.jar,/home/oracle/oracle/product/oraloader-3.0.0-h2/jlib/orai18n-utility.
jar,/home/oracle/oracle/product/oraloader-3.0.0-h2/jlib/orai18n-mapping.jar,/home/oracle/oracle/
product/oraloader-3.0.0-h2/jlib/orai18n-collation.jar,/home/oracle/oracle/product/oraloader-3.0.0-h2/jlib/
oraclepki.jar,/home/oracle/oracle/product/oraloader-3.0.0-h2/jlib/osdt_cert.jar,/home/oracle/oracle/product/
oraloader-3.0.0-h2/jlib/osdt_core.jar,/home/oracle/oracle/product/oraloader-3.0.0-h2/jlib/commons-math-
2.2.jar,/home/oracle/oracle/product/oraloader-3.0.0-h2/jlib/jackson-core-asl-1.8.8.jar,/home/oracle/
oracle/product/oraloader-3.0.0-h2/jlib/jackson-mapper-asl-1.8.8.jar,/home/oracle/oracle/product/
oraloader-3.0.0-h2/jlib/avro-1.7.3.jar,/home/oracle/oracle/product/oraloader-3.0.0-h2/jlib/avro-mapred-1.7.3-
hadoop2.jar,/home/oracle/oracle/product/oraloader-3.0.0-h2/jlib/oraloader.jar,/usr/lib/hive/lib/hive-metastore.
jar,/usr/lib/hive/lib/libthrift-0.9.0.cloudera.2.jar,/usr/lib/hive/lib/libfb303-0.9.0.jar,/usr/lib/
hive/lib/hive-common-0.12.0-cdh5.0.1.jar,/usr/lib/hive/lib/hive-exec.jar
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
38. Configuring the KM Physical Settings
•For the access table in Physical view, change LKM to LKM SQL Multi-Connect
•Delegates the multi-connect capabilities to the downstream node, so you can use a multi-connect
IKM such as IKM File/Hive to Oracle
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
39. Configuring the KM Physical Settings
•For the target table, select IKM File/Hive to Oracle
•Only becomes available to select once
LKM SQL Multi-Connect selected for access table
•Key option values to set are:
•OLH_OUTPUT_MODE (use JDBC initially, OCI
if Oracle Client installed on Hadoop client node)
•MAPRED_OUTPUT_BASE_DIR (set to directory
on HFDS that OS user running ODI can access)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
40. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Executing the Mapping
•Executing the mapping will invoke
OLH from the OS command line
•Hive table (or HDFS file) contents
copied to Oracle table
41. Demo
Unloading Hive Data to Oracle using OLH / ODI12c
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
42. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Thank You for Attending!
•Thank you for attending this presentation, and more information can be found at http://
www.rittmanmead.com
•Contact us at info@rittmanmead.com or mark.rittman@rittmanmead.com
•Look out for our book, “Oracle Business Intelligence Developers Guide” out now!
•Follow-us on Twitter (@rittmanmead) or Facebook (facebook.com/rittmanmead)
43. Lesson 4 : Hadoop Data Output
and Reporting OBIEE 11g
Mark Rittman, CTO, Rittman Mead
SIOUG and HROUG Conferences, Oct 2014
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com