This document summarizes a TIBCO Advanced Analytics meetup. It includes an agenda for presentations on TIBCO Analytics and data science, predictive analytics using TERR expressions, real-time analytics, APIs, and a question/answer wrap-up session. It also provides overviews of the Spotfire platform for data visualization and analytics, Spotfire capabilities for accessing and preparing data from various sources, and supported data sources.
Extending the Reach of R to the Enterprise with TERR and SpotfireLou Bajuk
An overview of how TIBCO integrates dynamic, interactive visual applications in Spotfire with predictive and advanced analytics in the R language, using TIBCO Enterprise Runtime for R--our R-compatible, enterprise-grade platform for the R language.
Democratizing data science Using spark, hive and druidDataWorks Summit
MZ is re-inventing how the entire world experiences data via our mobile games division MZ Games Studios, our digital marketing division Cognant, and our live data platform division Satori.
Growing need of data science capabilities across the organization requires an architecture that can democratize building these applications and disseminating insight from the outcome of data science applications to the wider organization.
Attend this session to learn about how we built a platform for data science using spark, hive, and druid specifically for our performance marketing division cognant.This platform powers several data science application like fraud detection and bid optimization at large scale.
We will be sharing lessons learned over past 3 years in building this platform by also walking through some of the actual data science applications built on top of this platform.
Attendees from ML engineering and data science background can gain deep insight from our experience of building this platform.
Speakers
Pushkar Priyadarshi, Director of Engineer, Michaine Zone Inc
Igor Yurinok, Staff Software Engineer, MZ
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Databricks
"The modernization of the tobacco industry is resulting in a shift towards a more data-driven approach to trade, operations and the consumer. The need to scale while maintaining margins is paramount, and today’s consumer requires more personalized engagement and value at every interaction to drive sales and revenue.
At Altria, we’re at the forefront of this evolution, leveraging hundreds of terabytes of big data (such as point-of-sale, clickstream, mobile data, and more) and machine learning to improve our ability to make smarter decisions and outpace the competition. This talk recaps our big data journey from a legacy data infrastructure (Teradata), isolated data systems, and the lack of resources which prevented our ability to move quickly and scale, to our current state where we’ve successfully implemented, architected and on-boarded tools and processes in stages of data acquisition, store, prepare, and business intelligence with Azure Data Lake, Azure Databricks, Azure Data factory, APIs Managements, Streaming and Hosting technologies and provided Data Analytics platform.
We’ll discuss the roadblocks we came across, how we overcame them, and how we employed a unified approach to big data and analytics through the fully managed Azure Databricks platform and the Azure suite of tools which allowed us to streamline workflows, improve operational performance, and ultimately introduce new customer experiences that drive engagement and revenue."
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...Databricks
Wejo has largest connected vehicle data set in the world, processing 17bn data points a day. Our data is of value to customers in multiple industries and to customers of multiple sizes. By utilising the Databricks whitelable offering allowing controlled, secure access to our data, we have opened up the unique value of Wejo data to a whole new user base.
Extending the Reach of R to the Enterprise with TERR and SpotfireLou Bajuk
An overview of how TIBCO integrates dynamic, interactive visual applications in Spotfire with predictive and advanced analytics in the R language, using TIBCO Enterprise Runtime for R--our R-compatible, enterprise-grade platform for the R language.
Democratizing data science Using spark, hive and druidDataWorks Summit
MZ is re-inventing how the entire world experiences data via our mobile games division MZ Games Studios, our digital marketing division Cognant, and our live data platform division Satori.
Growing need of data science capabilities across the organization requires an architecture that can democratize building these applications and disseminating insight from the outcome of data science applications to the wider organization.
Attend this session to learn about how we built a platform for data science using spark, hive, and druid specifically for our performance marketing division cognant.This platform powers several data science application like fraud detection and bid optimization at large scale.
We will be sharing lessons learned over past 3 years in building this platform by also walking through some of the actual data science applications built on top of this platform.
Attendees from ML engineering and data science background can gain deep insight from our experience of building this platform.
Speakers
Pushkar Priyadarshi, Director of Engineer, Michaine Zone Inc
Igor Yurinok, Staff Software Engineer, MZ
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Databricks
"The modernization of the tobacco industry is resulting in a shift towards a more data-driven approach to trade, operations and the consumer. The need to scale while maintaining margins is paramount, and today’s consumer requires more personalized engagement and value at every interaction to drive sales and revenue.
At Altria, we’re at the forefront of this evolution, leveraging hundreds of terabytes of big data (such as point-of-sale, clickstream, mobile data, and more) and machine learning to improve our ability to make smarter decisions and outpace the competition. This talk recaps our big data journey from a legacy data infrastructure (Teradata), isolated data systems, and the lack of resources which prevented our ability to move quickly and scale, to our current state where we’ve successfully implemented, architected and on-boarded tools and processes in stages of data acquisition, store, prepare, and business intelligence with Azure Data Lake, Azure Databricks, Azure Data factory, APIs Managements, Streaming and Hosting technologies and provided Data Analytics platform.
We’ll discuss the roadblocks we came across, how we overcame them, and how we employed a unified approach to big data and analytics through the fully managed Azure Databricks platform and the Azure suite of tools which allowed us to streamline workflows, improve operational performance, and ultimately introduce new customer experiences that drive engagement and revenue."
Databricks Whitelabel: Making Petabyte Scale Data Consumable to All Our Custo...Databricks
Wejo has largest connected vehicle data set in the world, processing 17bn data points a day. Our data is of value to customers in multiple industries and to customers of multiple sizes. By utilising the Databricks whitelable offering allowing controlled, secure access to our data, we have opened up the unique value of Wejo data to a whole new user base.
An Introduction to Graph: Database, Analytics, and Cloud ServicesJean Ihm
Graph analysis employs powerful algorithms to explore and discover relationships in social network, IoT, big data, and complex transaction data. Learn how graph technologies are used in applications such as fraud detection for banking, customer 360, public safety, and manufacturing. This session will provide an overview and demos of graph technologies for Oracle Cloud Services, Oracle Database, NoSQL, Spark and Hadoop, including PGX analytics and PGQL property graph query language.
Presented at Analytics and Data Summit, March 20, 2018
Reaching scale limits on a Hadoop platform: issues and errors created by spee...DataWorks Summit
Santander UK’s Big Data journey began in 2014, using Hadoop to make the most of our data and generate value for customers. Within 9 months, we created a highly available real-time customer facing application for customer analytics. We currently have 500 different people doing their own analysis and projects with this data, spanning a total of 50 different use cases. This data, (consisting of over 40 million customer records with billions of transactions), provides our business new insights that were inaccessible before.
Our business moves quickly, with several products and 20 use cases currently in production. We currently have a customer data lake and a technical data lake. Having a platform with very different workloads has proven to be challenging.
Our success in generating value created such growth in terms of data, use cases, analysts and usage patterns that 3 years later we find issues with scalability in HDFS, Hive metastore and Hadoop operations and challenges with highly available architectures with Hbase, Flume and Kafka. Going forward we are exploring alternative architectures including a hybrid cloud model, and moving towards streaming.
Our goal with this session is to assist people in the early part of their journey by building a solid foundation. We hope that others can benefit from us sharing our experiences and lessons learned during our journey.
Speaker
Nicolette Bullivant, Head of Data Engineering at Santander UK Technology, Santander UK Technology
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionRittman Analytics
Presentation at ODTUG KScope'18 on the data engineering and advanced analytics capabilities in Oracle Analytics Cloud Data Lake Edition, Oracle Big Data Cloud and Oracle Event Hub Cloud Service
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Phar Data Platform: From the Lakehouse Paradigm to the RealityDatabricks
Despite the increased availability of ready-to-use generic tools, more and more enterprises are deciding to build in-house data platforms. This practice, common for some time in research labs and digital native companies, is now making its waves across large enterprises that traditionally used proprietary solutions and outsourced most of their IT. The availability of large volumes of data, coupled with more and more complex analytical use cases driven by innovations in data science have yielded these traditional and on premise architectures to become obsolete in favor of cloud architectures powered by open source technologies.
The idea of building an in-house platform at a larger enterprise comes with many challenges of its own: Build an Architecture that combines the best elements of data lakes and data warehouses to accommodate all kinds from BI to ML use cases. The need to interoperate with all the company’s data and technology, including legacy systems. Cultural transformation, including a commitment to adopt agile processes and data driven approaches.
This presentation describes a success story on building a Lakehouse in an enterprise such as LIDL, a successful chain of grocery stores operating in 32 countries worldwide. We will dive into the cloud-based architecture for batch and streaming workloads based on many different source systems of the enterprise and how we applied security on architecture and data. We will detail the creation of a curated Data Lake comprising several layers from a raw ingesting layer up to a layer that presents cleansed and enriched data to the business units as a kind of Data Marketplace.
A lot of focus and effort went into building a semantic Data Lake as a sustainable and easy to use basis for the Lakehouse as opposed to just dumping source data into it. The first use case being applied to the Lakehouse is the Lidl Plus Loyalty Program. It is already deployed to production in 26 countries with more than 30 millions of customers’ data being analyzed on a daily basis. In parallel to productionizing the Lakehouse, a cultural and organizational change process was undertaken to get all involved units to buy into the new data driven approach.
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
Presented by David Smith, Chief Community Officer, Revolution Analytics at Garner Business Intelligence and Analytics Summit, April 2014.
In this presentation, I'll introduce the open source R language — the modern standard for Data Science — and the enhanced performance, scalability and ease-of-use capabilities of Revolution R Enterprise. Customer case studies will illustrate Revolution R Enterprise as a component of the real-time analytics deployment process, via integration with Hadoop, database warehousing systems and Cloud platforms, to implement data-driven end-user applications.
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
This session will detail best practices for architecting, building, operating and managing an Analytics Data Lake platform. Key topics will include:
1) Defining next-generation Data Lake architectures. The defacto standard has been commodity DAS servers with HDFS, but there are now multiple solutions aimed at separating compute and storage, virtualizing or containerizing Hadoop applications, and utilizing Hadoop compatible or embedded HDFS filesystems. This portion will explore the options available, and the pros and cons of each.
2) Data Ingest. There are many ways to load data into a Data Lake, including standardized Apache tools (Sqoop, Flume, Kafka, Storm, Spark, NiFi), standard file and object protocols (SFTP, NFS, Rest, WebHDFS), and proprietary tools (eg, Zaloni Bedrock, DataTorrent). This section will explore these options in the context of best fit to workflows; it will also look at key gaps and challenges, particularly in the areas of data formats and integration with metadata/cataloging tools.
3) Metadata & Cataloguing. One of the biggest inhibitors of successful Data Lake deployments is Data Governance, particularly in the areas of indexing, cataloguing and metadata management. It is nearly impossible to run analytics on top of a Data Lake and get meaningful & timely results without solving these problems. This portion will explore both emerging open standards (Apache Atlas, HCatalog) and proprietary tools (Cloudera Navigator, Zaloni Bedrock/Mica, Informatica Metadata Manager), and balance the pros, cons and gaps of each.
4) Security & Access Controls. Solving these challenges are key for adoption in regulatory driven industries like Healthcare & Financial Services. There are multiple Apache projects and proprietary tools to address this, but the challenge is making security and access controls consistent across the entire application and infrastructure stack, and over the data lifecycle, and being able to audit this in the face of legal challenges. This portion will explore available options and best practices.
5) Provisioning & Workflow Management. The real promise of the Data Lake is integrating Analytics workflows and tools on converged infrastructure-with shared data-and build “As A Service” oriented architectures that are oriented towards self-service data exploration and Analytics for end users. This is an emerging and immature area, but this session will explore some potential concepts, tools and options to achieve this.
This will be a moderately technical session, with the above topics being illustrated by real world examples. Attendees should have basic familiarity with Hadoop and the associated Apache projects.
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
Data visualization can be a tricky problem, even more if the dataset is made of several billions of 3-dimensional particles moving along the time. The talk will focus on some simple indexing and data thinning techniques and how (and how do not) implement them with Cassandra and Spark.
Delta Lake: Open Source Reliability w/ Apache SparkGeorge Chow
As presented: Sajith Appukuttan, Solution Architect, Databricks
Sept 12, 2019 at Vancouver Spark Meetup
Abstract: Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
Why would you want to use R capabilities within FME. It allows you to bring together the strengths of both applications. FME is capable of reading and writing to a wide variety of formats. FME has powerful data preparation and data clean-up transformations. R offers a wide range of wide variety of statistical tools including classical statistical a wide range data analysis, analytics and plotting tools. We’ll show you how to get the best of both worlds.
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...Lucas Jellema
Our technology has gotten smart and fast enough to make predictions and come up with recommendations in near real time. Machine Learning is the art of deriving models from our Big Data collections – harvesting historic patterns and trends – and applying those models to new data in order to rapidly and adequately respond to that data. This presentation will explain and demonstrate in simple, straightforward terms and using easy to understand practical examples what Machine Learning really is and how it can be useful in our world of applications, integrations and databases. Hadoop and Spark, real time and streaming analytics, Watson and Cloud Datalab, Jupyter Notebooks, Oracle Machine Learning CS and the Citizen Data Scientists all make their appearance, as does SQL.
Processing transactions is at the core of any bank’s business. Danske Bank’s journey started with recognising the value that could be gleaned from generating insights from the data to improve customer behaviour analytics. Today, the company streams large volumes of transactional data in near-real time onto its Hortonworks data Platform to improve fraud detection and customer marketing. In this session, Nadeem will outline the bank’s vision, how it was socialised across the executive board team and the resulting sponsorship, the technological path, challenges overcome and the results that have not only improved the customer experience but quantifiable metrics fraud and opening new revenue streams. Furthermore, Nadeem will cover future use cases around maintenance and operations.
Moving Targets: Harnessing Real-time Value from Data in Motion Inside Analysis
The Briefing Room with David Loshin and Datawatch
Live Webcast Feb. 17, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=4a053043c45cf0c2f6453dfb8577c72a
Patience may be a virtue, but when it comes to streaming analytics, waiting is no option. Between Big Data and the Internet of Things, businesses are faced with more data and greater complexity than ever before. Traditional information architectures simply cannot support the kind of processing necessary to make use of this fast-moving resource. The modern context requires a shorter path to analytics, one that narrows the gap between governance and discovery
Register for this episode of The Briefing Room to hear veteran Analyst David Loshin as he explains how the prevalence of streaming data is changing business pace and processes. He’ll be briefed by Dan Potter of Datawatch, who will tout his company’s real-time data discovery platform for data in motion. He will show how self-service data preparation can lead to faster insights, ultimately fostering the ability to make precise decisions at the right time.
Visit InsideAnalysis.com for more information.
An Introduction to Graph: Database, Analytics, and Cloud ServicesJean Ihm
Graph analysis employs powerful algorithms to explore and discover relationships in social network, IoT, big data, and complex transaction data. Learn how graph technologies are used in applications such as fraud detection for banking, customer 360, public safety, and manufacturing. This session will provide an overview and demos of graph technologies for Oracle Cloud Services, Oracle Database, NoSQL, Spark and Hadoop, including PGX analytics and PGQL property graph query language.
Presented at Analytics and Data Summit, March 20, 2018
Reaching scale limits on a Hadoop platform: issues and errors created by spee...DataWorks Summit
Santander UK’s Big Data journey began in 2014, using Hadoop to make the most of our data and generate value for customers. Within 9 months, we created a highly available real-time customer facing application for customer analytics. We currently have 500 different people doing their own analysis and projects with this data, spanning a total of 50 different use cases. This data, (consisting of over 40 million customer records with billions of transactions), provides our business new insights that were inaccessible before.
Our business moves quickly, with several products and 20 use cases currently in production. We currently have a customer data lake and a technical data lake. Having a platform with very different workloads has proven to be challenging.
Our success in generating value created such growth in terms of data, use cases, analysts and usage patterns that 3 years later we find issues with scalability in HDFS, Hive metastore and Hadoop operations and challenges with highly available architectures with Hbase, Flume and Kafka. Going forward we are exploring alternative architectures including a hybrid cloud model, and moving towards streaming.
Our goal with this session is to assist people in the early part of their journey by building a solid foundation. We hope that others can benefit from us sharing our experiences and lessons learned during our journey.
Speaker
Nicolette Bullivant, Head of Data Engineering at Santander UK Technology, Santander UK Technology
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionRittman Analytics
Presentation at ODTUG KScope'18 on the data engineering and advanced analytics capabilities in Oracle Analytics Cloud Data Lake Edition, Oracle Big Data Cloud and Oracle Event Hub Cloud Service
What’s New with Databricks Machine LearningDatabricks
In this session, the Databricks product team provides a deeper dive into the machine learning announcements. Join us for a detailed demo that gives you insights into the latest innovations that simplify the ML lifecycle — from preparing data, discovering features, and training and managing models in production.
Phar Data Platform: From the Lakehouse Paradigm to the RealityDatabricks
Despite the increased availability of ready-to-use generic tools, more and more enterprises are deciding to build in-house data platforms. This practice, common for some time in research labs and digital native companies, is now making its waves across large enterprises that traditionally used proprietary solutions and outsourced most of their IT. The availability of large volumes of data, coupled with more and more complex analytical use cases driven by innovations in data science have yielded these traditional and on premise architectures to become obsolete in favor of cloud architectures powered by open source technologies.
The idea of building an in-house platform at a larger enterprise comes with many challenges of its own: Build an Architecture that combines the best elements of data lakes and data warehouses to accommodate all kinds from BI to ML use cases. The need to interoperate with all the company’s data and technology, including legacy systems. Cultural transformation, including a commitment to adopt agile processes and data driven approaches.
This presentation describes a success story on building a Lakehouse in an enterprise such as LIDL, a successful chain of grocery stores operating in 32 countries worldwide. We will dive into the cloud-based architecture for batch and streaming workloads based on many different source systems of the enterprise and how we applied security on architecture and data. We will detail the creation of a curated Data Lake comprising several layers from a raw ingesting layer up to a layer that presents cleansed and enriched data to the business units as a kind of Data Marketplace.
A lot of focus and effort went into building a semantic Data Lake as a sustainable and easy to use basis for the Lakehouse as opposed to just dumping source data into it. The first use case being applied to the Lakehouse is the Lidl Plus Loyalty Program. It is already deployed to production in 26 countries with more than 30 millions of customers’ data being analyzed on a daily basis. In parallel to productionizing the Lakehouse, a cultural and organizational change process was undertaken to get all involved units to buy into the new data driven approach.
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
Presented by David Smith, Chief Community Officer, Revolution Analytics at Garner Business Intelligence and Analytics Summit, April 2014.
In this presentation, I'll introduce the open source R language — the modern standard for Data Science — and the enhanced performance, scalability and ease-of-use capabilities of Revolution R Enterprise. Customer case studies will illustrate Revolution R Enterprise as a component of the real-time analytics deployment process, via integration with Hadoop, database warehousing systems and Cloud platforms, to implement data-driven end-user applications.
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
This session will detail best practices for architecting, building, operating and managing an Analytics Data Lake platform. Key topics will include:
1) Defining next-generation Data Lake architectures. The defacto standard has been commodity DAS servers with HDFS, but there are now multiple solutions aimed at separating compute and storage, virtualizing or containerizing Hadoop applications, and utilizing Hadoop compatible or embedded HDFS filesystems. This portion will explore the options available, and the pros and cons of each.
2) Data Ingest. There are many ways to load data into a Data Lake, including standardized Apache tools (Sqoop, Flume, Kafka, Storm, Spark, NiFi), standard file and object protocols (SFTP, NFS, Rest, WebHDFS), and proprietary tools (eg, Zaloni Bedrock, DataTorrent). This section will explore these options in the context of best fit to workflows; it will also look at key gaps and challenges, particularly in the areas of data formats and integration with metadata/cataloging tools.
3) Metadata & Cataloguing. One of the biggest inhibitors of successful Data Lake deployments is Data Governance, particularly in the areas of indexing, cataloguing and metadata management. It is nearly impossible to run analytics on top of a Data Lake and get meaningful & timely results without solving these problems. This portion will explore both emerging open standards (Apache Atlas, HCatalog) and proprietary tools (Cloudera Navigator, Zaloni Bedrock/Mica, Informatica Metadata Manager), and balance the pros, cons and gaps of each.
4) Security & Access Controls. Solving these challenges are key for adoption in regulatory driven industries like Healthcare & Financial Services. There are multiple Apache projects and proprietary tools to address this, but the challenge is making security and access controls consistent across the entire application and infrastructure stack, and over the data lifecycle, and being able to audit this in the face of legal challenges. This portion will explore available options and best practices.
5) Provisioning & Workflow Management. The real promise of the Data Lake is integrating Analytics workflows and tools on converged infrastructure-with shared data-and build “As A Service” oriented architectures that are oriented towards self-service data exploration and Analytics for end users. This is an emerging and immature area, but this session will explore some potential concepts, tools and options to achieve this.
This will be a moderately technical session, with the above topics being illustrated by real world examples. Attendees should have basic familiarity with Hadoop and the associated Apache projects.
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
Data visualization can be a tricky problem, even more if the dataset is made of several billions of 3-dimensional particles moving along the time. The talk will focus on some simple indexing and data thinning techniques and how (and how do not) implement them with Cassandra and Spark.
Delta Lake: Open Source Reliability w/ Apache SparkGeorge Chow
As presented: Sajith Appukuttan, Solution Architect, Databricks
Sept 12, 2019 at Vancouver Spark Meetup
Abstract: Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
Why would you want to use R capabilities within FME. It allows you to bring together the strengths of both applications. FME is capable of reading and writing to a wide variety of formats. FME has powerful data preparation and data clean-up transformations. R offers a wide range of wide variety of statistical tools including classical statistical a wide range data analysis, analytics and plotting tools. We’ll show you how to get the best of both worlds.
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...Lucas Jellema
Our technology has gotten smart and fast enough to make predictions and come up with recommendations in near real time. Machine Learning is the art of deriving models from our Big Data collections – harvesting historic patterns and trends – and applying those models to new data in order to rapidly and adequately respond to that data. This presentation will explain and demonstrate in simple, straightforward terms and using easy to understand practical examples what Machine Learning really is and how it can be useful in our world of applications, integrations and databases. Hadoop and Spark, real time and streaming analytics, Watson and Cloud Datalab, Jupyter Notebooks, Oracle Machine Learning CS and the Citizen Data Scientists all make their appearance, as does SQL.
Processing transactions is at the core of any bank’s business. Danske Bank’s journey started with recognising the value that could be gleaned from generating insights from the data to improve customer behaviour analytics. Today, the company streams large volumes of transactional data in near-real time onto its Hortonworks data Platform to improve fraud detection and customer marketing. In this session, Nadeem will outline the bank’s vision, how it was socialised across the executive board team and the resulting sponsorship, the technological path, challenges overcome and the results that have not only improved the customer experience but quantifiable metrics fraud and opening new revenue streams. Furthermore, Nadeem will cover future use cases around maintenance and operations.
Moving Targets: Harnessing Real-time Value from Data in Motion Inside Analysis
The Briefing Room with David Loshin and Datawatch
Live Webcast Feb. 17, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=4a053043c45cf0c2f6453dfb8577c72a
Patience may be a virtue, but when it comes to streaming analytics, waiting is no option. Between Big Data and the Internet of Things, businesses are faced with more data and greater complexity than ever before. Traditional information architectures simply cannot support the kind of processing necessary to make use of this fast-moving resource. The modern context requires a shorter path to analytics, one that narrows the gap between governance and discovery
Register for this episode of The Briefing Room to hear veteran Analyst David Loshin as he explains how the prevalence of streaming data is changing business pace and processes. He’ll be briefed by Dan Potter of Datawatch, who will tout his company’s real-time data discovery platform for data in motion. He will show how self-service data preparation can lead to faster insights, ultimately fostering the ability to make precise decisions at the right time.
Visit InsideAnalysis.com for more information.
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
Comparison of Data Preparation vs. Data Wrangling Programming Languages, Frameworks and Tools in Machine Learning / Deep Learning Projects.
A key task to create appropriate analytic models in machine learning or deep learning is the integration and preparation of data sets from various sources like files, databases, big data storages, sensors or social networks. This step can take up to 80% of the whole project.
This session compares different alternative techniques to prepare data, including extract-transform-load (ETL) batch processing (like Talend, Pentaho), streaming analytics ingestion (like Apache Storm, Flink, Apex, TIBCO StreamBase, IBM Streams, Software AG Apama), and data wrangling (DataWrangler, Trifacta) within visual analytics. Various options and their trade-offs are shown in live demos using different advanced analytics technologies and open source frameworks such as R, Python, Apache Hadoop, Spark, KNIME or RapidMiner. The session also discusses how this is related to visual analytics tools (like TIBCO Spotfire), and best practices for how the data scientist and business user should work together to build good analytic models.
Key takeaways for the audience:
- Learn various options for preparing data sets to build analytic models
- Understand the pros and cons and the targeted persona for each option
- See different technologies and open source frameworks for data preparation
- Understand the relation to visual analytics and streaming analytics, and how these concepts are actually leveraged to build the analytic model after data preparation
Video Recording / Screencast of this Slide Deck: https://youtu.be/2MR5UynQocs
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
Talking about the ease of use and handling Big Data technologies in the Cloud. Using Google Cloud Platform and Amazon Web Services and all of the tools around it.
Showing the problems and how we can solve them with simple tools.
Big data is an opportunity for communications service providers (CSPs) to create the intelligence for operating their infrastructures more efficiently, to analyze the success of their services, and to create a better personal experience for their customers.
CSP Top executives, Network and IT managers and Marketing, are eager to exploit the large amount of information to achieve better business decisions. They expect their Chief Technical Officer to provide end-to-end analytic solutions based on the data available in their IT and network infrastructure.
This presentation analyzes the complete value chain that can transform CSPs’ data to knowledge. It covers the sources of information, the data collection tools, the analytic platforms providing quick data access, and finally the business intelligence use cases with the presentation and visualization of the results and predictions.
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...Codemotion
The world gets connected more and more every year due to Mobile, Cloud and Internet of Things. "Big Data" is currently a big hype. Large amounts of historical data are stored in Hadoop to find patterns, e.g. for predictive maintenance or cross-selling. But how to increase revenue or reduce risks in new transactions? "Fast Data" via stream processing is the solution to embed patterns into future actions in real-time. This session discusses how machine learning and analytic models with R, Spark MLlib, H2O, etc. can be integrated into real-time event processing. A live demo concludes the session
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Kai Wähner
Streaming Analytics Comparison of Open Source Frameworks, Products and Cloud Services. Includes Apache Storm, Flink, Spark, TIBCO, IBM, AWS Kinesis, Striim, Zoomdata, ...
This session discusses the technical concepts of stream processing / streaming analytics and how it is related to big data, mobile, cloud and internet of things. Different use cases such as predictive fault management or fraud detection are used to show and compare alternative frameworks and products for stream processing and streaming analytics.
The focus of the session lies on comparing
- different open source frameworks such as Apache Apex, Apache Flink or Apache Spark Streaming
- engines from software vendors such as IBM InfoSphere Streams, TIBCO StreamBase
- cloud offerings such as AWS Kinesis.
- real time streaming UIs such as Striim, Zoomdata or TIBCO Live Datamart.
Live demos will give the audience a good feeling about how to use these frameworks and tools.
The session will also discuss how stream processing is related to Apache Hadoop frameworks (such as MapReduce, Hive, Pig or Impala) and machine learning (such as R, Spark ML or H2O.ai).
The New Frontier: Optimizing Big Data ExplorationInside Analysis
The Briefing Room with Dr. Robin Bloor and Cirro
Live Webcast on February 11, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=0ec1fa381886313cc06d841015c65898
As information ecosystems continue to expand, businesses are searching for ways to combine traditional analytics with a new source of insight: Big Data. But with data flooding in from all kinds of sources, fast access and performance at scale can easily become an issue. One effective approach for solving this challenge is data federation, a method that involves taking the analytical processing to the data, allowing streamlined access to multiple data sources without the expensive ETL overhead or building of semantic layers.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he explains how the prevalence of distributed data calls for a new approach to Big Data. He will be briefed by Mark Theissen of Cirro, who will tout his company’s Data Hub, a data federation solution that provides a single point of access to all enterprise data assets without excessive data movements, preprocessing or staging. He will discuss how data federation differs from virtualization and ETL approaches, and demonstrate how a Cirro deployment solves the analytics challenge of integrating data silos across the data center – and the cloud – using the BI tools you already have on your desktop for real-time distributed analytics.
Visit InsideAnlaysis.com for more information.
Horses for Courses: Database RoundtableEric Kavanagh
The blessing and curse of today's database market? So many choices! While relational databases still dominate the day-to-day business, a host of alternatives has evolved around very specific use cases: graph, document, NoSQL, hybrid (HTAP), column store, the list goes on. And the database tools market is teeming with activity as well. Register for this special Research Webcast to hear Dr. Robin Bloor share his early findings about the evolving database market. He'll be joined by Steve Sarsfield of HPE Vertica, and Robert Reeves of Datical in a roundtable discussion with Bloor Group CEO Eric Kavanagh. Send any questions to info@insideanalysis.com, or tweet with #DBSurvival.
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
At Monsanto, emerging technologies such as IoT, advanced imaging and geo-spatial platforms; molecular breeding, ancestry and genomics data sets have made us rethink how we approach developing, deploying, scaling and distributing our software to accelerate predictive and prescriptive decisions. We created a Cloud based Data Science platform for the enterprise to address this need. Our primary goals were to perform analytics@scale and integrate analytics with our core product platforms.
As part of this talk, we will be sharing our journey of transformation showing how we enabled: a collaborative discovery analytics environment for data science teams to perform model development, provisioning data through APIs, streams and deploying models to production through our auto-scaling big-data compute in the cloud to perform streaming, cognitive, predictive, prescriptive, historical and batch analytics@scale, integrating analytics with our core product platforms to turn data into actionable insights.
Applying R in BI and Real Time applications EARL London 2015Lou Bajuk
Overview of the challenges of applying R in enterprise analytic applications, and TIBCO's approach to these challenges with Spotfire, TERR and Streambase.
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
Watch full webinar here: https://bit.ly/39AhUB7
Enterprise organizations are shifting to self-service analytics as business users need real-time access to holistic and consistent views of data regardless of its location, source or type for arriving at critical decisions.
Data Virtualization and Data Visualization work together through a universal semantic layer. Learn how they enable self-service data discovery and improve performance of your reports and dashboards.
In this session, you will learn:
- Challenges faced by business users
- How data virtualization enables self-service analytics
- Use case and lessons from customer success
- Overview of the highlight features in Tableau
Over 90% of today’s data has been generated in the last two years, and growth rates continue to climb. In this session, we’ll step through challenges and best practices with data capturing, how to derive meaningful insights to help predict the future, and common pitfalls in data analysis.
Come discover how integrated solutions involving Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon Machine Learning/Deep Learning result in effective data systems for data scientists and business users, alike.
Machine learning applications are typically stitched together from hopes and dreams, shell scripts, cron jobs, home-grown schedulers, snippets of configuration clipped from multiple blog posts, thousands of hard-coded business rules, a.k.a. "our SQL corpus," and a few lines of training and testing code. Organizing all the moving parts into something maintainable and supportive of ongoing development is a challenge most teams have on their TODO list, roadmap, or tech debt pile. Getting ahead of the day-to-day demands and settling into a sane architecture often seems like an unattainable goal. The past several years have seen an explosion of tool-building in the data engineering and analytics area, including in Apache projects spanning the areas of search and information retrieval, job orchestration, file and stream formats, and machine learning libraries. In this talk we will cover our product and development teams' choices of architecture and tools, from data ingestion and storage, through transformations and processing, to presentation of results and publishing to web services, reports, and applications.
Similar to TIBCO Advanced Analytics Meetup (TAAM) - June 2015 (20)
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
7. Spotfire Data Access!
!
DATA
SOURCES
XMLRDBMS
Flat
Files
CubesSpread-
sheets
Hadoop &
Big Data
stores
Analytical
DWs e.g.
Exadata
Event Data
Streams
Active
Spaces
In-‐Memory
Load
data
from
source
in
to
memory
In-‐Database
Leave
data
in
DB
Dynamically
load
and
discard
data
to
visualize
On-‐Demand
Dynamically
swap
data
in
and
out
of
memory.
SQL
MDX
1010
0110
15. Immediate
Long-‐Term
CompeDDve
Advantage
Value
to
the
Organiza,on
TIBCO
is
the
only
analy,cs
plaTorm
that
can
provide
value
to
the
organiza,on
across
the
full
spectrum
of
use
cases
Self-‐service
Dashboards
Event
Processing
Predic,ve
and
Prescrip,ve
Analy,cs
Measure
Diagnose
Predict
Op2mize
Opera2onalize
Automate
Analy2cs
Maturity
Analy2cs
Maturity
Model
26. Immediate
Long-‐Term
CompeDDve
Advantage
Value
to
the
Organiza,on
TIBCO
is
the
only
analy,cs
plaTorm
that
can
provide
value
to
the
organiza,on
across
the
full
spectrum
of
use
cases
Self-‐service
Dashboards
Event
Analy,cs
Predic,ve
and
Prescrip,ve
Analy,cs
Measure
Diagnose
Predict
Op,mize
Opera2onalize
Automate
Analy2cs
Maturity
Analy2cs
Maturity
Model
34. 1.
In-‐line
Expressions
2.
Expression
Func2ons
Spotfire-TERR Expression Functions!
Type
R
code
in
to
expression
field
in
Spo3ire
e.g.
-‐ Color
graph
by
clusters
-‐ Smooth
points
on
graph
Use
TERR_*
inbuilt
expression
funcAons
Many
entry
points
for
adding
expressions
Choose
Expression
FuncAon
from
menu
-‐ Inbuilt
-‐ Extension
(you
or
someone
else)
via
R
code
Use
just
like
other
expression
funcAons
in
an
expression
Many
entry
points
for
adding
expressions
45. Immediate
Long-‐Term
CompeDDve
Advantage
Value
to
the
Organiza,on
TIBCO
is
the
only
analy,cs
plaTorm
that
can
provide
value
to
the
organiza,on
across
the
full
spectrum
of
use
cases
Self-‐service
Dashboards
Event
Analy,cs
Predic,ve
and
Prescrip,ve
Analy,cs
Measure
Diagnose
Predict
Op,mize
Opera2onalize
Automate
Analy2cs
Maturity
Analy2cs
Maturity
Model
48. Managing Industrial Equipment!
Big Data
– Analysis of production
– Failure analytics
Fast Data
– Real-time sensor data
– Leading indicator for shutdowns
– Drilling: kick detection
– Flow monitoring
Benefits
– Reduced NPT: Big $$s
– System reliability
– Efficient drilling
49. 2. Find Leading
Indicators
3. Backtest
Rules / Models
4. Push
Rules / Models
to Event Server
1. Study
Anomalies
Managing Industrial Equipment!
52. Optimizing Manufacturing Processes
Big Data
– Analysis of product quality
– Models for yield
– Models for defects
Fast Data
– In-line QA/QC!
Benefits
Maximize productivity
Improve quality
Optimize machine operations
54. Customer Offers for Retailers
Big Data
– Customer propensity to purchase
products
– Product affinity
– Customer segmentation
Fast Data
– In-line scoring on transactions!
– Targeted offers to customers!
Benefits
– Optimize inventory
– Enhance customer experience
57. • IronPython
controls
behavior
of
Spodire
• We
maintain
library
of
IronPython
func2ons
• ….
toggling
all
zoom
sliders
• Adding
marker
layers
to
a
map
• …
and
many
more
Spotfire API’s
58. Todays Presenters: Jagrata Minardi
Jagrata
Minardi
is
a
Staff
Solu2ons
Consultant
with
TIBCO
Sobware,
suppor2ng
Financial
Services
and
other
industries.
Previously,
he
worked
for
Insighdul
Corpora2on,
a
provider
of
analy2c
sobware
and
solu2ons.
Since
1997,
he
has
supported
customers
in
the
areas
of
pordolio
construc2on,
pordolio
management,
asset
price
forecas2ng,
risk
modeling,
and
risk
aggrega2on.
60. Ian
Cook
is
a
Data
Scien2st
at
TIBCO
focused
on
applying
the
R
sta2s2cal
programming
language
to
rapidly
solve
business
problems
across
industry
ver2cals.
Ian
founded
and
organizes
the
R
users
group
in
the
Raleigh,
North
Carolina
area.
Prior
to
his
role
at
TIBCO,
Ian
worked
as
a
sta2s2cal
sobware
developer
for
the
semiconductor
company
Advanced
Micro
Devices.
Todays Presenters: Ian Cook
65. Todays Presenters: Ujval Kamath
Ujval
Kamath
is
a
Data
Scien2st
at
TIBCO.
He
is
focused
on
developing
predic2ve
models
in
R
that
are
deployed
in
Spodire
and
StreamBase
for
data
at
mo2on
and
data
at
rest.
He
has
experience
in
a
range
of
industries,
including
Oil
and
Gas/Energy,
Consumer
Packaged
Goods,
Manufacturing,
and
Compu2ng
66. Spotfire and StreamBase!
Spodire
is
used
to
Create
and
Analyze
Customer
Segmenta2on
and
Propensity
StreamBase
is
used
to
score
new
transac2ons
in
real
2me
Spodire
is
used
to
understand
the
demographics
of
customers
around
stores
67. Todays Presenters: Andrew Berridge
Andrew
Berridge
is
a
Sr
Solu2on
Consultant
at
TIBCO.
He
joined
the
Spodire
data
science
team
in
2011
and
has
15
years'
experience
working
in
pharmaceu2cals
and
other
industries.
Andrew
specializes
in
developing
tools,
extensions
and
integra2ons
with
other
technology
pladorms
for
Spodire
using
IronPython,
C#,
Java
and
JavaScript.
68. Extending and Customizing Spotfire!
• Many ways of extending and customizing Spotfire platform
• All APIs are publicly documented, eg
– Spotfire .NET API: https://docs.tibco.com/pub/doc_remote/spotfire/7.0.0/doc/api/Index.aspx
• Extend functionality of desktop and web clients:
– TERR scripting
– Data functions
– IronPython scripting
– JavaScript in text areas for UI elements
– C# extensions (tools, transformations, calculations, etc.)
– JavaScript mashup API for embedding in web applications
• JavaScript Visualizations
– Use any JavaScript visualization framework
– e.g. D3, HighCharts
• Extend Automation Services
– Custom tasks
• Custom authentication/Single Sign-on (SSO)
69. Example: Write-back to Database from Spotfire!
• Why!
– Take action from within your analysis!
– Comment on data points!
– Update external systems!
• How!
– SQL within Spotfire Information Link with parameters!
– Execute Information Link with IronPython, passing in marked data as parameters!
– Can use other methods - this is simple !
70. SQL In Information Link!
• Must return data to Spotfire – we return the data table!
• INSERT then SELECT!
INSERT INTO [SimpleDemo].[dbo].[UserActions]!
([State], [CoC], [Username], [Comment])!
VALUES!
(?State, ?CoC, %CURRENT_USER%, ?Comment);!
SELECT!
U1."id" AS "ID", U1."DateTime" AS "DATETIME", U1."State" AS "STATE",!
U1."CoC" AS "COC", U1."Username" AS "USERNAME",!
U1."Comment" AS "COMMENT"!
FROM!
"SimpleDemo"."dbo"."UserActions" U1!
WHERE!
<conditions>!
!
71. IronPython Code!
• Iterate over the marked rows in the data table:!
– Set up the parameters for the Information Link!
• Name!
• Value!
– Call the Information Link for each marked row!
• Identified by its GUID in the Spotfire library!
!
72. Next Steps with Spotfire!
!
!
spodire.2bco.com/trial
spodire.2bco.com/learn/spodire-‐desktop-‐quickstart
spodire.2bco.com/learn/spodire-‐cloud-‐quickstart
Register for a live Spotfire demonstration
spotfire.tibco.com/learn/live-demo
77. Webcasts!
!
Insight and Action - Analyzing Your OSIsoft
PI System Data!
Tuesday, July 7, 2015 1 PM EST!
Presenter: Michael O'Connell & Dave Leigh!
!
Predictive Analytics in the Energy Sector:
Asset Valuation!
Tuesday, July 28, 2015 1PM EST!
Presenter: Michael O'Connell & Peter Shaw with
Haas Engineering and R Lacy!
!
Seeing Stars: the Gartner BI Bakeoff!
Recording, May 27, 2015!
Presenter: Anna Nowakowska & Michael
O'Connell!
Events spotfire.tibco.com/about-us/events!
!
78. 78
Fast Data ! ! ! ! ! !www.tibco.com!
htp://d2.2bco.com/fast-‐data-‐
webinars#event-‐processing-‐ROI
79. 79
useR!!
!
! Lou
Bajuk-‐Yorgan
–
Spodire
Product
Management
Ian
Cook
–
Data
Scien2st
Difei
Luo
–
Data
Scien2st
If
you
would
like
to
set
up
a
mee2ng
please
contact
Lou
Bajuk-‐Yorkan
at
lbajuk@,bco.com
or
Lars
Sveding
at
lsveding@,bco.com