The document discusses Oracle Big Data Discovery and how it can be used to analyze and gain insights from data stored in a Hadoop data reservoir. It provides an example scenario where Big Data Discovery is used to analyze website logs, tweets, and website posts and comments to understand popular content and influencers for a company. The data is ingested into the Big Data Discovery tool, which automatically enriches the data. Users can then explore the data, apply additional transformations, and visualize relationships to gain insights.
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...Mark Rittman
This talk focus is on what a data reservoir is, how it related to the RDBMS DW, and how Big Data Discovery provides access to it to business and BI users
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Mark Rittman
As presented at OGh SQL Celebration Day 2016 - including new content on why NoSQL and Hadoop is a better solution for social network analysis than the Oracle Database (for now...)
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsMark Rittman
This is a session for Oracle DBAs and devs that looks at the cutting edge big data techs like Spark, Kafka etc, and through demos shows how Hadoop is now a a real-time platform for fast analytics, data integration and predictive modeling
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
Hadoop and NoSQL platforms initially focused on Java developers and slow but massively-scalable MapReduce jobs as an alternative to high-end but limited-scale analytics RDBMS engines. Apache Hive opened-up Hadoop to non-programmers by adding a SQL query engine and relational-style metadata layered over raw HDFS storage, and since then open-source initiatives such as Hive Stinger, Cloudera Impala and Apache Drill along with proprietary solutions from closed-source vendors have extended SQL-on-Hadoop’s capabilities into areas such as low-latency ad-hoc queries, ACID-compliant transactions and schema-less data discovery – at massive scale and with compelling economics.
In this session we’ll focus on technical foundations around SQL-on-Hadoop, first reviewing the basic platform Apache Hive provides and then looking in more detail at how ad-hoc querying, ACID-compliant transactions and data discovery engines work along with more specialised underlying storage that each now work best with – and we’ll take a look to the future to see how SQL querying, data integration and analytics are likely to come together in the next five years to make Hadoop the default platform running mixed old-world/new-world analytics workloads.
Using Oracle Big Data Discovey as a Data Scientist's ToolkitMark Rittman
As delivered at Trivadis Tech Event 2016 - how Big Data Discovery along with Python and pySpark was used to build predictive analytics models against wearables and smart home data
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...Mark Rittman
This talk focus is on what a data reservoir is, how it related to the RDBMS DW, and how Big Data Discovery provides access to it to business and BI users
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Mark Rittman
As presented at OGh SQL Celebration Day 2016 - including new content on why NoSQL and Hadoop is a better solution for social network analysis than the Oracle Database (for now...)
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsMark Rittman
This is a session for Oracle DBAs and devs that looks at the cutting edge big data techs like Spark, Kafka etc, and through demos shows how Hadoop is now a a real-time platform for fast analytics, data integration and predictive modeling
Gluent New World #02 - SQL-on-Hadoop : A bit of History, Current State-of-the...Mark Rittman
Hadoop and NoSQL platforms initially focused on Java developers and slow but massively-scalable MapReduce jobs as an alternative to high-end but limited-scale analytics RDBMS engines. Apache Hive opened-up Hadoop to non-programmers by adding a SQL query engine and relational-style metadata layered over raw HDFS storage, and since then open-source initiatives such as Hive Stinger, Cloudera Impala and Apache Drill along with proprietary solutions from closed-source vendors have extended SQL-on-Hadoop’s capabilities into areas such as low-latency ad-hoc queries, ACID-compliant transactions and schema-less data discovery – at massive scale and with compelling economics.
In this session we’ll focus on technical foundations around SQL-on-Hadoop, first reviewing the basic platform Apache Hive provides and then looking in more detail at how ad-hoc querying, ACID-compliant transactions and data discovery engines work along with more specialised underlying storage that each now work best with – and we’ll take a look to the future to see how SQL querying, data integration and analytics are likely to come together in the next five years to make Hadoop the default platform running mixed old-world/new-world analytics workloads.
Using Oracle Big Data Discovey as a Data Scientist's ToolkitMark Rittman
As delivered at Trivadis Tech Event 2016 - how Big Data Discovery along with Python and pySpark was used to build predictive analytics models against wearables and smart home data
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
As presented at OGh SQL Celebration Day in June 2016, NL. Covers new features in Big Data SQL including storage indexes, storage handlers and ability to install + license on commodity hardware
What is Big Data Discovery, and how it complements traditional business anal...Mark Rittman
Data Discovery is an analysis technique that complements traditional business analytics, and enables users to combine, explore and analyse disparate datasets to spot opportunities and patterns that lie hidden within your data. Oracle Big Data discovery takes this idea and applies it to your unstructured and big data datasets, giving users a way to catalogue, join and then analyse all types of data across your organization.
In this session we'll look at Oracle Big Data Discovery and how it provides a "visual face" to your big data initatives, and how it complements and extends the work that you currently do using business analytics tools.
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we’ll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete “data fabric” solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.
A series of tweets I posted about my 11hr struggle to make a cup of tea with my WiFi kettle ended-up going viral, got picked-up by the national and then international press, and led to thousands of retweets, comments and references in the media. In this session we’ll take the data I recorded on this Twitter activity over the period and use Oracle Big Data Graph and Spatial to understand what caused the breakout and the tweet going viral, who were the key influencers and connectors, and how the tweet spread over time and over geography from my original series of posts in Hove, England.
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
At Monsanto, emerging technologies such as IoT, advanced imaging and geo-spatial platforms; molecular breeding, ancestry and genomics data sets have made us rethink how we approach developing, deploying, scaling and distributing our software to accelerate predictive and prescriptive decisions. We created a Cloud based Data Science platform for the enterprise to address this need. Our primary goals were to perform analytics@scale and integrate analytics with our core product platforms.
As part of this talk, we will be sharing our journey of transformation showing how we enabled: a collaborative discovery analytics environment for data science teams to perform model development, provisioning data through APIs, streams and deploying models to production through our auto-scaling big-data compute in the cloud to perform streaming, cognitive, predictive, prescriptive, historical and batch analytics@scale, integrating analytics with our core product platforms to turn data into actionable insights.
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think
ODI12c as your Big Data Integration HubMark Rittman
Presentation from the recent Oracle OTN Virtual Technology Summit, on using Oracle Data Integrator 12c to ingest, transform and process data on a Hadoop cluster.
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
Enterprise Holding’s first started with Hadoop as a POC in 2013. Today, we have clusters on premises and in the cloud. This talk will explore our experience with Big Data and outline three common big data architectures (batch, lambda, and kappa). Then, we’ll dive into the decision points to necessary for your own cluster, for example: cloud vs on premises, physical vs virtual, workload, and security. These decisions will help you understand what direction to take. Finally, we’ll share some lessons learned with the pieces of our architecture worked well and rant about those which didn’t. No deep Hadoop knowledge is necessary, architect or executive level.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoalarsgeorge
Keynote during BiDaTA 2013 in Genoa, a special track of the ADBIS 2013 conference. URL: http://dbdmg.polito.it/bidata2013/index.php/keynote-presentation
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Mark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014
In this presentation we cover some key Hadoop concepts including HDFS, MapReduce, Hive and NoSQL/HBase, with the focus on Oracle Big Data Appliance and Cloudera Distribution including Hadoop. We explain how data is stored on a Hadoop system and the high-level ways it is accessed and analysed, and outline Oracle’s products in this area including the Big Data Connectors, Oracle Big Data SQL, and Oracle Business Intelligence (OBI) and Oracle Data Integrator (ODI).
How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon
Looking to implement Hadoop but haven’t pulled the trigger yet? You are not alone. Many companies have heard the hype about how Hadoop can solve the challenges presented by big data, but few have actually implemented it. What’s preventing them from taking the plunge? Can it be done in small steps to ensure project success?
This session will discuss some of the items to consider when getting started with Hadoop and how to go about making the decision to move to the de facto big data platform. Starting small can be a good approach when your company is learning the basics and deciding what direction to take. There is no need to invest large amounts of time and money up front if a proof of concept is all you aim to provide. Using well known data sets on virtual machines can provide a low cost and effort implementation to know if your big data journey will be successful with Hadoop.
Many organizations focus on the licensing cost of Hadoop when considering migrating to a cloud platform. But other costs should be considered, as well as the biggest impact, which is the benefit of having a modern analytics platform that can handle all of your use cases. This session will cover lessons learned in assisting hundreds of companies to migrate from Hadoop to Databricks.
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Mark Rittman
Presentation from the Rittman Mead BI Forum 2015 masterclass, pt.2 of a two-part session that also covered creating the Discovery Lab. Goes through setting up Flume log + twitter feeds into CDH5 Hadoop using ODI12c Advanced Big Data Option, then looks at the use of OBIEE11g with Hive, Impala and Big Data SQL before finally using Oracle Big Data Discovery for faceted search and data mashup on-top of Hadoop
Big Data 2.0: ETL & Analytics: Implementing a next generation platformCaserta
In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.
Access additional slides from this meetup here:
http://www.slideshare.net/CasertaConcepts/big-data-warehousing-meetup-january-20
For more information on our services or upcoming events, please visit http://www.actian.com/ or http://www.casertaconcepts.com/.
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
As presented at OGh SQL Celebration Day in June 2016, NL. Covers new features in Big Data SQL including storage indexes, storage handlers and ability to install + license on commodity hardware
What is Big Data Discovery, and how it complements traditional business anal...Mark Rittman
Data Discovery is an analysis technique that complements traditional business analytics, and enables users to combine, explore and analyse disparate datasets to spot opportunities and patterns that lie hidden within your data. Oracle Big Data discovery takes this idea and applies it to your unstructured and big data datasets, giving users a way to catalogue, join and then analyse all types of data across your organization.
In this session we'll look at Oracle Big Data Discovery and how it provides a "visual face" to your big data initatives, and how it complements and extends the work that you currently do using business analytics tools.
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we’ll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete “data fabric” solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.
A series of tweets I posted about my 11hr struggle to make a cup of tea with my WiFi kettle ended-up going viral, got picked-up by the national and then international press, and led to thousands of retweets, comments and references in the media. In this session we’ll take the data I recorded on this Twitter activity over the period and use Oracle Big Data Graph and Spatial to understand what caused the breakout and the tweet going viral, who were the key influencers and connectors, and how the tweet spread over time and over geography from my original series of posts in Hove, England.
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
At Monsanto, emerging technologies such as IoT, advanced imaging and geo-spatial platforms; molecular breeding, ancestry and genomics data sets have made us rethink how we approach developing, deploying, scaling and distributing our software to accelerate predictive and prescriptive decisions. We created a Cloud based Data Science platform for the enterprise to address this need. Our primary goals were to perform analytics@scale and integrate analytics with our core product platforms.
As part of this talk, we will be sharing our journey of transformation showing how we enabled: a collaborative discovery analytics environment for data science teams to perform model development, provisioning data through APIs, streams and deploying models to production through our auto-scaling big-data compute in the cloud to perform streaming, cognitive, predictive, prescriptive, historical and batch analytics@scale, integrating analytics with our core product platforms to turn data into actionable insights.
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
Most DBAs are aware something interesting is going on with big data and the Hadoop product ecosystem that underpins it, but aren't so clear about what each component in the stack does, what problem each part solves and why those problems couldn't be solved using the old approach. We'll look at where it's all going with the advent of Spark and machine learning, what's happening with ETL, metadata and analytics on this platform ... why IaaS and datawarehousing-as-a-service will have such a big impact, sooner than you think
ODI12c as your Big Data Integration HubMark Rittman
Presentation from the recent Oracle OTN Virtual Technology Summit, on using Oracle Data Integrator 12c to ingest, transform and process data on a Hadoop cluster.
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
Enterprise Holding’s first started with Hadoop as a POC in 2013. Today, we have clusters on premises and in the cloud. This talk will explore our experience with Big Data and outline three common big data architectures (batch, lambda, and kappa). Then, we’ll dive into the decision points to necessary for your own cluster, for example: cloud vs on premises, physical vs virtual, workload, and security. These decisions will help you understand what direction to take. Finally, we’ll share some lessons learned with the pieces of our architecture worked well and rant about those which didn’t. No deep Hadoop knowledge is necessary, architect or executive level.
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoalarsgeorge
Keynote during BiDaTA 2013 in Genoa, a special track of the ADBIS 2013 conference. URL: http://dbdmg.polito.it/bidata2013/index.php/keynote-presentation
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Mark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014
In this presentation we cover some key Hadoop concepts including HDFS, MapReduce, Hive and NoSQL/HBase, with the focus on Oracle Big Data Appliance and Cloudera Distribution including Hadoop. We explain how data is stored on a Hadoop system and the high-level ways it is accessed and analysed, and outline Oracle’s products in this area including the Big Data Connectors, Oracle Big Data SQL, and Oracle Business Intelligence (OBI) and Oracle Data Integrator (ODI).
How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon
Looking to implement Hadoop but haven’t pulled the trigger yet? You are not alone. Many companies have heard the hype about how Hadoop can solve the challenges presented by big data, but few have actually implemented it. What’s preventing them from taking the plunge? Can it be done in small steps to ensure project success?
This session will discuss some of the items to consider when getting started with Hadoop and how to go about making the decision to move to the de facto big data platform. Starting small can be a good approach when your company is learning the basics and deciding what direction to take. There is no need to invest large amounts of time and money up front if a proof of concept is all you aim to provide. Using well known data sets on virtual machines can provide a low cost and effort implementation to know if your big data journey will be successful with Hadoop.
Many organizations focus on the licensing cost of Hadoop when considering migrating to a cloud platform. But other costs should be considered, as well as the biggest impact, which is the benefit of having a modern analytics platform that can handle all of your use cases. This session will cover lessons learned in assisting hundreds of companies to migrate from Hadoop to Databricks.
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Mark Rittman
Presentation from the Rittman Mead BI Forum 2015 masterclass, pt.2 of a two-part session that also covered creating the Discovery Lab. Goes through setting up Flume log + twitter feeds into CDH5 Hadoop using ODI12c Advanced Big Data Option, then looks at the use of OBIEE11g with Hive, Impala and Big Data SQL before finally using Oracle Big Data Discovery for faceted search and data mashup on-top of Hadoop
Big Data 2.0: ETL & Analytics: Implementing a next generation platformCaserta
In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access.
Access additional slides from this meetup here:
http://www.slideshare.net/CasertaConcepts/big-data-warehousing-meetup-january-20
For more information on our services or upcoming events, please visit http://www.actian.com/ or http://www.casertaconcepts.com/.
Pertumbuhan adalah proses pertambahan jumlah dan atau ukuran sel dan tidak dapat kembali kebentuk semula (irreversible), dapat diukur (dinyatakan dengan angka, grafik dsb).
Perkembangan adalah proses menuju ke tingkat kedewasaan / pematangan tidak dapat diukur tetapi hanya dapat di amati.
Mukjizat Nabi Muhammad SAW dan Penafian Terhadap Dakwaan OrientalisEzad Azraai Jamsari
Nota perkuliahan PBJJ bagi kursus PPPY1272 Fiqh Sirah, kursus WAJIB dari Jabatan Pengajian Arab dan Tamadun Islam, Fakulti Pengajian Islam, Universiti Kebangsaan Malaysia.
5th International Disaster and Risk Conference IDRC 2014 Integrative Risk Management - The role of science, technology & practice 24-28 August 2014 in Davos, Switzerland
KScope14 - Real-Time Data Warehouse Upgrade - Success StoriesMichael Rainey
Providing real-time data to its global customers is a necessity for IFPI (International Federation of the Phonographic Industry), a not-for-profit organization with a mission to safeguard the rights of record producers and promote the value of recorded music. Using Oracle Streams and Oracle Warehouse Builder (OWB) for real-time data replication and integration, meeting this goal was becoming a challenge. The solution was difficult to maintain and overall throughput was degrading as data volume increased. The need for greater stability and performance led IFPI to implement Oracle GoldenGate and Oracle Data Integrator. This session will describe the innovative approach taken to complete the migration from a Streams and OWB implementation to a more robust, maintainable, and performant GoldenGate and ODI integrated solution.
Building the Modern Data Hub: Beyond the Traditional Enterprise Data WarehouseFormant
Datavail and SlamData present on how to use NoSQL technologies (MongoDB and SlamData) to build a Data Hub -- the fast and easy way to real-time business insight.
Part 4 - Hadoop Data Output and Reporting using OBIEE11gMark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014.
Once insights and analysis have been produced within your Hadoop cluster by analysts and technical staff, it’s usually the case that you want to share the output with a wider audience in the organisation. Oracle Business Intelligence has connectivity to Hadoop through Apache Hive compatibility, and other Oracle tools such as Oracle Big Data Discovery and Big Data SQL can be used to visualise and publish Hadoop data. In this final session we’ll look at what’s involved in connecting these tools to your Hadoop environment, and also consider where data is optimally located when large amounts of Hadoop data need to be analysed alongside more traditional data warehouse datasets
OBIEE, Endeca, Hadoop and ORE Development (on Exalytics) (ODTUG 2013)Mark Rittman
A presentation from ODTUG 2013 on tools other than OBIEE for Exalytics, focusing on analysis of non-traditional data via Endeca, "big data" via Hadoop and statistical analysis / predictive modeling through Oracle R Enterprise, and the benefits of running these tools on Oracle Exalytics
Leveraging Hadoop with OBIEE 11g and ODI 11g - UKOUG Tech'13Mark Rittman
The latest releases of OBIEE and ODI come with the ability to connect to Hadoop data sources, using MapReduce to integrate data from clusters of "big data" servers complementing traditional BI data sources. In this presentation, we will look at how these two tools connect to Apache Hadoop and access "big data" sources, and share tips and tricks on making it all work smoothly.
How you can gain rapid insights and create more flexibility by capturing and storing data from a variety of sources and structures into a NoSQL database.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Practical Tips for Oracle Business Intelligence Applications 11g ImplementationsMichael Rainey
It's time to move to the Oracle Data Integrator version of Oracle Business Intelligence Applications! This has been the theme since it was recently announced that Oracle BI Applications 11g will move forward without updating functionality in the Informatica version of the product. Implementing Oracle BI Applications can be quite a challenge, specifically with Oracle Data Integrator being the “new” ETL tool. This session will provide attendees with practical tips, based on real-world experience, to help them get started with their implementation. How and why to use Oracle GoldenGate, high availability considerations, disaster recovery setup, and other functional and design factors will be covered, enhancing the attendee's ability to make the best design decisions for their BI Applications project.
Presented at KScope15 & NWOUG 2015.
Are You Killing the Benefits of Your Data Lake?Denodo
Watch the full webinar on-demand here: https://goo.gl/RL1ZSa
Data lakes are centralized data repositories. Data needed by data scientists is physically copied to a data lake which serves as a one storage environment. This way, data scientists can access all the data from only one entry point – a one-stop shop to get the right data. However, such an approach is not always feasible for all the data and limits it’s use to solely data scientists, making it a single-purpose system.
So, what’s the solution?
A multi-purpose data lake allows a broader and deeper use of the data lake without minimizing the potential value for data science and without making it an inflexible environment
Attend this session to learn:
• Disadvantages and limitations that are weakening or even killing the potential benefits of a data lake.
• Why a multi-purpose data lake is essential in building a universal data delivery system.
• How to build a logical multi-purpose data lake using data virtualization.
Do not miss this opportunity to make your data lake project successful and beneficial.
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
This is from the talk I gave at the 30th Anniversary NoCOUG meeting in San Jose, CA.
We all know that data warehouses and best practices for them are changing dramatically today. As organizations build new data warehouses and modernize established ones, they are turning to Data Warehousing as a Service (DWaaS) in hopes of taking advantage of the performance, concurrency, simplicity, and lower cost of a SaaS solution or simply to reduce their data center footprint (and the maintenance that goes with that).
But what is a DWaaS really? How is it different from traditional on-premises data warehousing?
In this talk I will:
• Demystify DWaaS by defining it and its goals
• Discuss the real-world benefits of DWaaS
• Discuss some of the coolest features in a DWaaS solution as exemplified by the Snowflake Elastic Data Warehouse.
Presentation by Mark Rittman, Technical Director, Rittman Mead, on ODI 11g features that support enterprise deployment and usage. Delivered at BIWA Summit 2013, January 2013.
Deep-Dive into Big Data ETL with ODI12c and Oracle Big Data ConnectorsMark Rittman
Presented at Oracle Openworld 2014 - a look at the ETL process within a Hadoop cluster, how data gets in, out and around, and how ODI12c and Oracle's Big Data Connectors can be used for this process.
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
Robin Bloor and Teradata
Live Webcast on April 22, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=2e69345c0a6a4e5a8de6fc72652e3bc6
Can you replace the data warehouse with Hadoop? Is Hadoop an ideal ETL subsystem? And what is the real magic of Hadoop? Everyone is looking to capitalize on the insights that lie in the vast pools of big data. Generating the value of that data relies heavily on several factors, especially choosing the right solution for the right context. With so many options out there, how do organizations best integrate these new big data solutions with the existing data warehouse environment?
Register for this episode of The Briefing Room to hear veteran analyst Dr. Robin Bloor as he explains where Hadoop fits into the information ecosystem. He’ll be briefed by Dan Graham of Teradata, who will offer perspective on how Hadoop can play a critical role in the analytic architecture. Bloor and Graham will interactively discuss big data in the big picture of the data center and will also seek to dispel several common misconceptions about Hadoop.
Visit InsideAnlaysis.com for more information.
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...Caserta
Caserta Concepts' implementation team presented a solution that performs big data analytics on active trade data in real-time. They presented the core components – Storm for the real-time ingest, Cassandra, a NoSQL database, and others. For more information on future events, please check out http://www.casertaconcepts.com/.
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cMark Rittman
Slides from my 2hr session at the UKOUG Tech'14 Super Sunday event, covering Hadoop basics and use of Oracle Data Integrator 12c for ETL on the Hadoop platform. Also some coverage of Oracle big data product announcements from OOW2014.
A Walk Through the Kimball ETL Subsystems with Oracle Data IntegrationMichael Rainey
Big Data integration is an excellent feature in the Oracle Data Integration product suite (Oracle Data Integrator, GoldenGate, & Enterprise Data Quality). But not all analytics require big data technologies, such as labor cost, revenue, or expense reporting. Ralph Kimball, an original architect of the dimensional model in data warehousing, spent much of his career working to build an enterprise data warehouse methodology that can meet these reporting needs. His book, "The Data Warehouse ETL Toolkit", is a guide for many ETL developers. This session will walk you through his ETL Subsystem categories; Extracting, Cleaning & Conforming, Delivering, and Managing, describing how the Oracle Data Integration products are perfectly suited for the Kimball approach.
Presented at Oracle OpenWorld 2015 & BIWA Summit 2016.
Similar to Unlock the value in your big data reservoir using oracle big data discovery and rittman mead (20)
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products such as Oracle Big Data SQL on the Oracle Big Data Appliance along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we'll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete "data fabric" solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
Presented at the UKOUG Business Analytics SIG Meeting in April 2016, addresses the question as to whether enterprise BI tools such as OBIEE12c are relevant in the world of Gartner BiModal Mode 1 + Mode 2 analytics, and Hybrid cloud/on-premise deployments
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...Mark Rittman
OBIEE12c comes with an updated version of Essbase that focuses entirely in this release on the query acceleration use-case. This presentation looks at this new release and explains how the new BI Accelerator Wizard manages the creation of Essbase cubes to accelerate OBIEE query performance
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Mark Rittman
Presentation given at Oracle Openworld 2015 on moving an existing OBIEE11g BI platform to Oracle Public Cloud, including accompanying DW database and continuing the ETL process. Explores migration process and what's now possible in Oracle Cloud for hosting full OBIEE platforms, and looks at what the benefits of such a migration might be for customers and end-users.
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015Mark Rittman
Slides from a two-day OBIEE11g seminar in Dubai, February 2015, at the Oracle University Expert Summit. Covers the following topics:
1. OBIEE 11g Overview & New Features
2. Adding Exalytics and In-Memory Analytics to OBIEE 11g
3. Source Control and Concurrent Development for OBIEE
4. No Silver Bullets - OBIEE 11g Performance in the Real World
5. Oracle BI Cloud Service Overview, Tips and Techniques
6. Moving to Oracle BI Applications 11g + ODI
7. Oracle Essbase and Oracle BI EE 11g Integration Tips and Techniques
8. OBIEE 11g and Predictive Analytics, Hadoop & Big Data
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cMark Rittman
Delivered as a one-day seminar at the SIOUG and HROUG Oracle User Group Conferences, October 2014.
There are many ways to ingest (load) data into a Hadoop cluster, from file copying using the Hadoop Filesystem (FS) shell through to real-time streaming using technologies such as Flume and Hadoop streaming. In this session we’ll take a high-level look at the data ingestion options for Hadoop, and then show how Oracle Data Integrator and Oracle GoldenGate leverage these technologies to load and process data within your Hadoop cluster. We’ll also consider the updated Oracle Information Management Reference Architecture and look at the best places to land and process your enterprise data, using Hadoop’s schema-on-read approach to hold low-value, low-density raw data, and then use the concept of a “data factory” to load and process your data into more traditional Oracle relational storage, where we hold high-density, high-value data.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. info@rittmanmead.com www.rittmanmead.com @rittmanmead 2
•Mark Rittman, Co-Founder of Rittman Mead
‣Oracle ACE Director, specialising in Oracle BI&DW
‣14 Years Experience with Oracle Technology
‣Regular columnist for Oracle Magazine
•Author of two Oracle Press Oracle BI books
‣Oracle Business Intelligence Developers Guide
‣Oracle Exalytics Revealed
‣Writer for Rittman Mead Blog :
http://www.rittmanmead.com/blog
•Email : mark.rittman@rittmanmead.com
•Twitter : @markrittman
About the Speaker
3. info@rittmanmead.com www.rittmanmead.com @rittmanmead 3
•Started back in 1997 on a bank Oracle DW project
•Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL
and shell scripts
•Went on to use Oracle Developer/2000 and Designer/2000
•Our initial users queried the DW using SQL*Plus
•And later on, we rolled-out Discoverer/2000 to everyone else
•And life was fun…
15+ Years in Oracle BI and Data Warehousing
4. info@rittmanmead.com www.rittmanmead.com @rittmanmead 4
•Over time, this data warehouse architecture developed
•Added Oracle Warehouse Builder to
automate and model the DW build
•Oracle 9i Application Server (yay!)
to deliver reports and web portals
•Data Mining and OLAP in the database
•Oracle 9i for in-database ETL (and RAC)
•Data was typically loaded from
Oracle RBDMS and EBS
•It was turtles Oracle all the way down…
The Oracle-Centric DW Architecture
5. info@rittmanmead.com www.rittmanmead.com @rittmanmead 5
•Many customers and organisations are now running initiatives around “big data”
•Some are IT-led and are looking for cost-savings around data warehouse storage + ETL
•Others are “skunkworks” projects in the marketing department that are now scaling-up
•Projects now emerging from pilot exercises
•And design patterns starting to emerge
Many Organisations are Running Big Data Initiatives
6. info@rittmanmead.com www.rittmanmead.com @rittmanmead 6
•Typical implementation of Hadoop and big data in an analytic context is the “data lake”
•Additional data storage platform with cheap storage, flexible schema support + compute
•Data lands in the data lake or reservoir in raw form, then minimally processed
•Data then accessed directly by “data scientists”, or processed further into DW
Common Big Data Design Pattern : “Data Reservoir”
10. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
An Interesting Question.
11. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Meanwhile, back in the real world…
12. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
13. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
14. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
15. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Customer 360-Degree Insight
16. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
17. info@rittmanmead.com www.rittmanmead.com @rittmanmead 17
Data from Real-Time, Social & Internet Sources is Strange
Single Customer View
Enriched
Customer Profile
Correlating
Modeling
Machine
Learning
Scoring
•Typically comes in non-tabular form
•JSON, log files, key/value pairs
•Users often want it speculatively
‣Haven’t though through final
purpose
•Schema can change over time
‣Or maybe there isn’t even one
•But the end-users want it now
‣Not when your ETL team are next
free
18. info@rittmanmead.com www.rittmanmead.com @rittmanmead 18
•Hadoop & NoSQL better suited to exploratory analysis of
newly-arrived data reservoir type-data
‣Flexible schema - applied by user rather than ETL
‣Cheap expandable storage for detail-level data
‣Better native support for machine-learning and
data discovery tools and processes
‣Potentially a great fit for our new and emerging
customer 360 datasets, and great platform for analysis
Introducing Hadoop - Cheap, Flexible Storage + Compute
20. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
•Start with pilot for area of the business that needs a single view of customers
•Then, over time, iterate and build out the Customer 360-degree view
Delivering a Successful Customer 360-Degree View
Start with a business area
that
needs a single
customer view
Obtain clear
understanding of
customer online & offline
behaviour
Build out
Predictive Models
and Decision Engines
to deliver value now
Build out Hadoop Data
Reservoir, Feeds
and link to DW + CRM
Iterate and Build-out,
add new integrations,
incrementally building
capability
Develop and Implement Strategy, Deliver Business Value
Build DevOps Capability
Pilot & Quick Win
Create Full Production InfrastructurePilot (Virtualised / Commodity) Hadoop Infrastructure
21. info@rittmanmead.com www.rittmanmead.com @rittmanmead 21
But … These Data Sources are Strange
Single Customer View
Enriched
Customer Profile
Correlating
Modeling
Machine
Learning
Scoring
•Typically comes in non-tabular form
•JSON, log files, key/value pairs
•Users often want it speculatively
‣Haven’t though through final
purpose
•Schema can change over time
‣Or maybe there isn’t even one
•But the end-users want it now
‣Not when your ETL team are next
free
25. info@rittmanmead.com www.rittmanmead.com @rittmanmead 25
•Data loaded into the reservoir needs preparation and curation before presenting to users
•Specialist skills typically needed to ingest and understand data - and those staff are scarce
•How do we staff and scale projects as our use of big data matures?
But … Working with Unstructured Textual Data Is Hard
29. info@rittmanmead.com www.rittmanmead.com @rittmanmead 29
•Part of the acquisition of Endeca back in 2012 by
Oracle Corporation
•Based on search technology and concept of
“faceted search”
•Data stored in flexible NoSQL-style in-memory
database called “Endeca Server”
•Added aggregation, text analytics and text
enrichment features for “data discovery”
‣Explore data in raw form, loose connections,
navigate via search rather than hierarchies
‣Useful to find out what is relevant and valuable in
a dataset before formal modeling
What Was Oracle Endeca Information Discovery?
30. info@rittmanmead.com www.rittmanmead.com @rittmanmead 30
•Proprietary database engine focused on search and analytics
•Data organized as records, made up of attributes stored as key/value pairs
•No over-arching schema,
no tables, self-describing attributes
•Endeca Server hallmarks:
‣Minimal upfront design
‣Support for “jagged” data
‣Administered via web service calls
‣“No data left behind”
‣“Load and Go”
•But … limited in scale (>1m records)
‣… what if it could be rebuilt on Hadoop?
Endeca Server Technology Combined Search +
Analytics
40. info@rittmanmead.com www.rittmanmead.com @rittmanmead 40
•A visual front-end to the Hadoop data reservoir, providing end-user access to datasets
•Catalog, profile, analyse and combine schema-on-read datasets across the Hadoop cluster
•Visualize and search datasets to gain insights, potentially load in summary form into DW
Oracle Big Data Discovery
41. info@rittmanmead.com www.rittmanmead.com @rittmanmead 41
What Does Big Data Discovery Do?
•Provide a visual catalog and search function across data in the data reservoir
•Profile and understand data, relationships, data quality issues
•Apply simple changes, enrichment to incoming data
•Visualize datasets including combinations (joins)
42. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
•Start with pilot for area of the business that needs a single view of customers
•Then, over time, iterate and build out the Customer 360-degree view
Delivering a Successful Customer 360-Degree View
Start with a business area
that
needs a single
customer view
Obtain clear
understanding of
customer online & offline
behaviour
Build out
Predictive Models
and Decision Engines
to deliver value now
Build out Hadoop Data
Reservoir, Feeds
and link to DW + CRM
Iterate and Build-out,
add new integrations,
incrementally building
capability
Develop and Implement Strategy, Deliver Business Value
Build DevOps Capability
Pilot & Quick Win
Create Full Production InfrastructurePilot (Virtualised / Commodity) Hadoop Infrastructure
43. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Delivering a Successful Customer 360-Degree View
Build out
Predictive Models
and Decision Engines
to deliver value now
Build out Hadoop Data
Reservoir, Feeds
and link to DW + CRM
Build DevOps Capability
44. info@rittmanmead.com www.rittmanmead.com @rittmanmead 44
•Provide a visual catalog and search function across data in the data reservoir
•Profile and understand data, relationships, data quality issues
•Apply simple changes, enrichment to incoming data
•Visualize datasets including combinations (joins)
What Does Big Data Discovery Do?
45. info@rittmanmead.com www.rittmanmead.com @rittmanmead 45
•Rittman Mead want to understand drivers and audience for their website
‣What is our most popular content? Who are the most in-demand blog authors?
‣Who are the influencers? What do they read?
•Three data sources in scope:
Example Scenario : Social Media Analysis
RM Website Logs Twitter Stream Website Posts, Comments etc
46. info@rittmanmead.com www.rittmanmead.com @rittmanmead 46
•Datasets in Hive have to be ingested into DGraph engine before analysis, transformation
•Can either define an automatic Hive table detector process, or manually upload
•Typically ingests 1m row random sample
‣1m row sample provides > 99% confidence that answer is within 2% of value shown
no matter how big the full dataset (1m, 1b, 1q+)
‣Makes interactivity cheap - representative dataset
Ingesting & Sampling Datasets for the DGraph Engine
47. info@rittmanmead.com www.rittmanmead.com @rittmanmead 47
•Ingested datasets are now visible in Big Data Discovery Studio
•Create new project from first dataset, then add second
View Ingested Datasets, Create New Project
48. info@rittmanmead.com www.rittmanmead.com @rittmanmead 48
•Ingestion process has automatically geo-coded host IP addresses
•Other automatic enrichments run after initial discovery step, based on datatypes, content
Automatic Enrichment of Ingested Datasets
49. info@rittmanmead.com www.rittmanmead.com @rittmanmead 49
•For the ACCESS_PER_POST_CAT_AUTHORS dataset, 18 attributes now available
•Combination of original attributes, and derived attributes added by enrichment process
Initial Data Exploration On Uploaded Dataset Attributes
50. info@rittmanmead.com www.rittmanmead.com @rittmanmead 50
•Data ingest process automatically applies some enrichments - geocoding etc
•Can apply others from Transformation page - simple transformations & Groovy expressions
Data Transformation & Enrichment
51. info@rittmanmead.com www.rittmanmead.com @rittmanmead 51
•Uses Salience text engine under the covers
•Extract terms, sentiment, noun groups, positive / negative words etc
Transformations using Text Enrichment / Parsing
52. info@rittmanmead.com www.rittmanmead.com @rittmanmead 52
•Choose option to Create New Attribute, to add derived attribute to dataset
•Preview changes, then save to transformation script
Create New Attribute using Derived (Transformed) Values
12
3
53. info@rittmanmead.com www.rittmanmead.com @rittmanmead 53
•Users can upload their own datasets into BDD, from MS Excel or CSV file
•Uploaded data is first loaded into Hive table, then sampled/ingested as normal
Upload Additional Datasets
1
2
3
54. info@rittmanmead.com www.rittmanmead.com @rittmanmead 54
•Used to create a dataset based on the intersection (typically) of two datasets
•Not required to just view two or more datasets together - think of this as a JOIN and SELECT
Join Datasets On Common Attributes
57. info@rittmanmead.com www.rittmanmead.com @rittmanmead 57
•BDD Studio dashboards support faceted search across all attributes, refinements
•Auto-filter dashboard contents on selected attribute values - for data discovery
•Fast analysis and summarisation through Endeca Server technology
Faceted Search Across Entire Data Reservoir
Further refinement on
“OBIEE” in post keywords
3
Results now filtered
on two refinements
4
58. info@rittmanmead.com www.rittmanmead.com @rittmanmead 58
•Visual Analyzer also provides a form of “data discovery” for BI users
‣Similar to Tableau, Qlikview etc
‣Inspired by BI elements of OEID
•Uses OBIEE RPD as the primary datasource,
so data needs to be curated + structured
•Probably a better option for users who
aren’t concerned its “big data”
•But can still connect to Hadoop via
Hive, Impala and Oracle Big Data SQL
Comparing BDD to Oracle Visual Analyzer
59. info@rittmanmead.com www.rittmanmead.com @rittmanmead 59
•Data in the data reservoir typically is raw, hasn’t been organised into facts, dimensions yet
•In this initial phase, you don’t want to it to be - too much up-front work with unknown data
•Later on though, users will benefit from structure and hierarchies being added to data
•But this takes work, and you need to understand cost/benefit of doing it now vs. later
Managed vs. Free-Form Data Discovery
60. info@rittmanmead.com www.rittmanmead.com @rittmanmead 60
•Transformations within BDD can then be used to create curated fact + dim Hive tables
•Can be used then as a more suitable dataset for use with OBIEE RPD + Visual Analyzer
•Or exported then in to Exadata or Exalytics to combine with main DW datasets
Export Prepared Datasets Back to Hive, for OBIEE + VA
61. info@rittmanmead.com www.rittmanmead.com @rittmanmead 61
•Users in Visual Analyzer then have
a more structured dataset to use
•Data organised into dimensions,
facts, hierarchies and attributes
•Can still access Hadoop directly
through Impala or Big Data SQL
•Big Data Discovery though was
key to initial understanding of data
Further Analyse in Visual Analyzer for Managed
Dataset
62. info@rittmanmead.com www.rittmanmead.com @rittmanmead 62
•Oracle Big Data Discovery used to go back to the raw event data add more meaning
•Enrich data, extract nouns + terms, add reference data from file, RDBMS etc
•Understand sentiment + meaning of tweets, link disparate + loosely coupled events
•Faceted search dashboards
Oracle BDD for Data Wrangling + Data Enrichment
63. info@rittmanmead.com www.rittmanmead.com @rittmanmead 63
•Previous counts assumed that all tweet references equally important
•But some Twitter users are far more influential than others
‣Sit at the centre of a community, have 1000’s of followers
‣A reference by them has massive impact on page views
‣Positive or negative comments from them drive perception
•Can we identify them?
‣Potentially “reach out” with analyst program
‣Study what website posts go “viral”
‣Understand out audience, and the conversation, better
But Who Are The Influencers In Our Community?
64. info@rittmanmead.com www.rittmanmead.com @rittmanmead 64
•Rittman Mead website features many types of content
‣Blogs on BI, data integration, big data, data warehousing
‣Op-Eds (“OBIEE12c - Three Months In, What’s the Verdict?”)
‣Articles on a theme, e.g. performance tuning
‣Details of new courses, new promotions
•Different communities likely to form around these content types
•Different influencers and patterns of recommendation, discovery
•Can we identify some of the communities, segment our audience?
What Communities and Networks Are Our Audience?
65. info@rittmanmead.com www.rittmanmead.com @rittmanmead 65
Graph Example : RM Blog Post Referenced on Twitter
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
00 0 0 Page Views10 0 0 Page Views
Follows
20 0 0 Page Views
Follows
30 0 0 Page Views
66. info@rittmanmead.com www.rittmanmead.com @rittmanmead 66
Network Effect Magnified by Extent of Social Graph
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
30 0 0 Page Views70 0 5 Page Views
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
67. info@rittmanmead.com www.rittmanmead.com @rittmanmead 67
Retweets by Influential Twitter Users Drive Visits
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
30 0 0 Page Views
Retweet
50 0 3 Page ViewsRT: Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
69. info@rittmanmead.com www.rittmanmead.com @rittmanmead 69
Property Graph Terminology
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Mentions
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Retweets
Node, or “Vertex”
Directed Connection, or “Edge”
Node, or “Vertex”
70. info@rittmanmead.com www.rittmanmead.com @rittmanmead 70
•Different types of Twitter interaction could imply more or less “influence”
‣Retweet of another user’s Tweet
implies that person is worth quoting
or you endorse their opinion
‣Reply to another user’s tweet
could be a weaker recognition of
that person’s opinion or view
‣Mention of a user in a tweet is a
weaker recognition that they are
part of a community / debate
Determining Influencers - Factors to Consider
71. info@rittmanmead.com www.rittmanmead.com @rittmanmead 71
Relative Importance of Edge Types Added via
Weights
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Mentions, Weight = 30
Lifting the Lid on OBIEE Internals with
Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Retweet, Weight = 100
Edge Property
Edge Property
72. info@rittmanmead.com www.rittmanmead.com @rittmanmead 72
•Graph, spatial and raster data processing for big data
‣Runs on-prem, or in Oracle Big Data Cloud Service
‣Installable on commodity cluster using CDH
•Data stored in Apache HBase or Oracle NoSQL DB
‣Complements Spatial & Graph in Oracle Database
‣Designed for trillions of nodes, edges etc
•Out-of-the-box spatial enrichment services
•Over 35 of most popular graph analysis functions
‣Graph traversal, recommendations
‣Finding communities and influencers,
‣Pattern matching
Oracle Big Data Spatial & Graph
73. info@rittmanmead.com www.rittmanmead.com @rittmanmead 73
Calculating Top 10 Users using Page Rank Algorithm
Top 10 influencers:
markrittman
rmoff
rittmanmead
mRainey
JeromeFr
Nephentur
borkur
BIExperte
i_m_dave
dw_pete
78. info@rittmanmead.com www.rittmanmead.com @rittmanmead 78
Determining Communities via Twitter Interactions
• Clusters based on actual interaction
patterns, not hashtags
• Detects real communities, not ones
that exist just in-theory
79. info@rittmanmead.com www.rittmanmead.com @rittmanmead 79
•Extend your organisation’s reach into your data with Oracle Big Data Discovery, Cloudera
Hadoop and the Rittman Mead Big Data Rapid Start.
•The Big Data Rapid Start is a fixed price, two week engagement delivered by Rittman
Mead’s team of Oracle, Big Data and Data Discovery consultants, designed to quickly
provide everything required to begin discovering the hidden value of your data.
•Move forward with confidence in the technology, process and application of Big Data
Discovery with the support of the world’s leaders.
Big Data Rapid Start from Rittman Mead
80. info@rittmanmead.com www.rittmanmead.com @rittmanmead 80
•Articles on the Rittman Mead Blog
‣http://www.rittmanmead.com/category/oracle-big-data-appliance/
‣http://www.rittmanmead.com/category/big-data/
‣http://www.rittmanmead.com/category/oracle-big-data-discovery/
•Rittman Mead offer consulting, training and managed services for Oracle Big Data
‣Oracle & Cloudera partners
‣http://www.rittmanmead.com/bigdata
Additional Resources