These are high level considerations of when to use the Integrated Data Warehouse or Hadoop for a specific workload. There are times one if the clear choice and times when there us overlapping requirements to consider. We present both pros and cons for both. But you must get into the requirements details to make a sensible decision.
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...Avinash Ramineni
Enterprises have been rapidly adopting data lakes as a complement or replacement of data warehouses. Many of the Data lake implementations are ignoring the inherent drawbacks and limitations of Data Lakes and ending up as data swamps with little or no benefit to the businesses. In this session we will go through some of challenges and the key aspects that need to be considered for successful Data lake implementations.
Jethro data meetup index base sql on hadoop - oct-2014Eli Singer
JethroData Index based SQL on Hadoop engine.
Architecture comparison of MPP / Full-Scan sql engines such as Impala and Hive to index-based access such as Jethro.
SQL and NoSQL NYC meetup Oct 20 2014
Boaz Raufman
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
As presented at OGh SQL Celebration Day in June 2016, NL. Covers new features in Big Data SQL including storage indexes, storage handlers and ability to install + license on commodity hardware
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...Avinash Ramineni
Enterprises have been rapidly adopting data lakes as a complement or replacement of data warehouses. Many of the Data lake implementations are ignoring the inherent drawbacks and limitations of Data Lakes and ending up as data swamps with little or no benefit to the businesses. In this session we will go through some of challenges and the key aspects that need to be considered for successful Data lake implementations.
Jethro data meetup index base sql on hadoop - oct-2014Eli Singer
JethroData Index based SQL on Hadoop engine.
Architecture comparison of MPP / Full-Scan sql engines such as Impala and Hive to index-based access such as Jethro.
SQL and NoSQL NYC meetup Oct 20 2014
Boaz Raufman
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...Mark Rittman
As presented at OGh SQL Celebration Day in June 2016, NL. Covers new features in Big Data SQL including storage indexes, storage handlers and ability to install + license on commodity hardware
Slides for the talk at AI in Production meetup:
https://www.meetup.com/LearnDataScience/events/255723555/
Abstract: Demystifying Data Engineering
With recent progress in the fields of big data analytics and machine learning, Data Engineering is an emerging discipline which is not well-defined and often poorly understood.
In this talk, we aim to explain Data Engineering, its role in Data Science, the difference between a Data Scientist and a Data Engineer, the role of a Data Engineer and common concepts as well as commonly misunderstood ones found in Data Engineering. Toward the end of the talk, we will examine a typical Data Analytics system architecture.
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we’ll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete “data fabric” solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.
Tired of seeing the loading spinner of doom while trying to analyze your big data on Tableau? Learn how Jethro accelerates your database so you can interactively analyze your big data on Tableau and gain the crucial insights that you need without losing your train of thought. Jethro enables you to be completely flexible with no need for partitions in order to speed up the data. This presentation will explain how indexing is a superior architecture for the BI use case when dealing with big data while compared to MPP architecture.
Effective data governance is imperative to the success of Data Lake initiatives. Without governance policies and processes, information discovery and analysis is severely impaired. In this session we will provide an in-depth look into the Data Governance Initiative launched collaboratively between Hortonworks and partners from across industries. We will cover the objectives of Data Governance Initiatives and demonstrate key governance capabilities of the Hortonworks Data Platform.
My presentation slides from Hadoop Summit, San Jose, June 28, 2016. See live video at http://www.makedatauseful.com/vid-solving-performance-problems-hadoop/ and follow along for context.
Moving analytic workloads into production - specific technical challenges and best practices for engineering SQL in Hadoop solutions. Highlighting the next generation engineering approaches to the secret sauce we have implemented in the Actian VectorH database.
Notes on Data Governance in Hadoop. This is my self learning by reading hortonworks and cloudera manual, then come up with these slides. Please refer in detail at cloudera.com
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
Comcast's Streaming Data platform comprises a variety of ingest, transformation, and storage services in the public cloud. Peer-reviewed Apache Avro schemas support end-to-end data governance. We have previously reported (DataWorks Summit 2017) on how we extended Atlas with custom entity and process types for discovery and lineage in the AWS public cloud. Custom lambda functions notify Atlas of creation of new entities and new lineage links via asynchronous kafka messaging.
Recently we were presented the challenge of providing integrated data discovery and lineage across our public cloud datasources and on-prem datasources, both Hadoop-based and traditional data warehouses and RDBMSs. Can Apache Atlas meet this challenge? A resounding yes! This talk will present our federated architecture, with Atlas providing SQL-like, free-text, and graph search across select metadata from all on-prem and public cloud data sources in our purview. Lightweight, custom connectors/bridges identify metadata/lineage changes in underlying sources and publish them to Atlas via the asynchronous API. A portal layer provides Atlas query access and a federation of UIs. Once data of interest is identified via Atlas queries, interfaces specific to underlying sources may be used for special-purpose metadata mining.
While metadata repositories for data discovery and lineage abound, none of them have built-in connectors and listeners for the entire complement of data sources that Comcast and many other large enterprises use to support their business needs. In-house-built solutions typically underestimate the cost of development and maintenance and often suffer from architecture-by-accretion. Atlas' commitment to extensibility, built-in provision of typed, free-text, and graph search, and REST and asynchronous APIs, position it uniquely in the build-vs-buy sweet spot.
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...Michael Rainey
Big Data integration is an excellent feature in the Oracle Data Integration product suite (Oracle Data Integrator, GoldenGate, & Enterprise Data Quality). But not all analytics require big data technologies, such as labor cost, revenue, or expense reporting. Ralph Kimball, an original architect of the dimensional model in data warehousing, spent much of his career working to build an enterprise data warehouse methodology that can meet these reporting needs. His book, "The Data Warehouse ETL Toolkit", is a guide for many ETL developers. This session will walk you through his ETL Subsystem categories; Extracting, Cleaning & Conforming, Delivering, and Managing, describing how the Oracle Data Integration products are perfectly suited for the Kimball approach.
Presented at Collaborate16 in Las Vegas.
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
Presented at the UKOUG Business Analytics SIG Meeting in April 2016, addresses the question as to whether enterprise BI tools such as OBIEE12c are relevant in the world of Gartner BiModal Mode 1 + Mode 2 analytics, and Hybrid cloud/on-premise deployments
Spark is fast and general engine for large-scale data processing which can solve all of your problems.
… Or can it?
This talk will cover real-world issues encountered during migration of the existing product to Spark infrastructure.
Aimed at software engineers that just started to evaluate Spark or those who are already using it.
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsMark Rittman
This is a session for Oracle DBAs and devs that looks at the cutting edge big data techs like Spark, Kafka etc, and through demos shows how Hadoop is now a a real-time platform for fast analytics, data integration and predictive modeling
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
From DataEngConf 2017 - Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible.
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products such as Oracle Big Data SQL on the Oracle Big Data Appliance along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we'll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete "data fabric" solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.
Slides for the talk at AI in Production meetup:
https://www.meetup.com/LearnDataScience/events/255723555/
Abstract: Demystifying Data Engineering
With recent progress in the fields of big data analytics and machine learning, Data Engineering is an emerging discipline which is not well-defined and often poorly understood.
In this talk, we aim to explain Data Engineering, its role in Data Science, the difference between a Data Scientist and a Data Engineer, the role of a Data Engineer and common concepts as well as commonly misunderstood ones found in Data Engineering. Toward the end of the talk, we will examine a typical Data Analytics system architecture.
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?Mark Rittman
There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we’ll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete “data fabric” solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.
Tired of seeing the loading spinner of doom while trying to analyze your big data on Tableau? Learn how Jethro accelerates your database so you can interactively analyze your big data on Tableau and gain the crucial insights that you need without losing your train of thought. Jethro enables you to be completely flexible with no need for partitions in order to speed up the data. This presentation will explain how indexing is a superior architecture for the BI use case when dealing with big data while compared to MPP architecture.
Effective data governance is imperative to the success of Data Lake initiatives. Without governance policies and processes, information discovery and analysis is severely impaired. In this session we will provide an in-depth look into the Data Governance Initiative launched collaboratively between Hortonworks and partners from across industries. We will cover the objectives of Data Governance Initiatives and demonstrate key governance capabilities of the Hortonworks Data Platform.
My presentation slides from Hadoop Summit, San Jose, June 28, 2016. See live video at http://www.makedatauseful.com/vid-solving-performance-problems-hadoop/ and follow along for context.
Moving analytic workloads into production - specific technical challenges and best practices for engineering SQL in Hadoop solutions. Highlighting the next generation engineering approaches to the secret sauce we have implemented in the Actian VectorH database.
Notes on Data Governance in Hadoop. This is my self learning by reading hortonworks and cloudera manual, then come up with these slides. Please refer in detail at cloudera.com
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
Comcast's Streaming Data platform comprises a variety of ingest, transformation, and storage services in the public cloud. Peer-reviewed Apache Avro schemas support end-to-end data governance. We have previously reported (DataWorks Summit 2017) on how we extended Atlas with custom entity and process types for discovery and lineage in the AWS public cloud. Custom lambda functions notify Atlas of creation of new entities and new lineage links via asynchronous kafka messaging.
Recently we were presented the challenge of providing integrated data discovery and lineage across our public cloud datasources and on-prem datasources, both Hadoop-based and traditional data warehouses and RDBMSs. Can Apache Atlas meet this challenge? A resounding yes! This talk will present our federated architecture, with Atlas providing SQL-like, free-text, and graph search across select metadata from all on-prem and public cloud data sources in our purview. Lightweight, custom connectors/bridges identify metadata/lineage changes in underlying sources and publish them to Atlas via the asynchronous API. A portal layer provides Atlas query access and a federation of UIs. Once data of interest is identified via Atlas queries, interfaces specific to underlying sources may be used for special-purpose metadata mining.
While metadata repositories for data discovery and lineage abound, none of them have built-in connectors and listeners for the entire complement of data sources that Comcast and many other large enterprises use to support their business needs. In-house-built solutions typically underestimate the cost of development and maintenance and often suffer from architecture-by-accretion. Atlas' commitment to extensibility, built-in provision of typed, free-text, and graph search, and REST and asynchronous APIs, position it uniquely in the build-vs-buy sweet spot.
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...Michael Rainey
Big Data integration is an excellent feature in the Oracle Data Integration product suite (Oracle Data Integrator, GoldenGate, & Enterprise Data Quality). But not all analytics require big data technologies, such as labor cost, revenue, or expense reporting. Ralph Kimball, an original architect of the dimensional model in data warehousing, spent much of his career working to build an enterprise data warehouse methodology that can meet these reporting needs. His book, "The Data Warehouse ETL Toolkit", is a guide for many ETL developers. This session will walk you through his ETL Subsystem categories; Extracting, Cleaning & Conforming, Delivering, and Managing, describing how the Oracle Data Integration products are perfectly suited for the Kimball approach.
Presented at Collaborate16 in Las Vegas.
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
Presented at the UKOUG Business Analytics SIG Meeting in April 2016, addresses the question as to whether enterprise BI tools such as OBIEE12c are relevant in the world of Gartner BiModal Mode 1 + Mode 2 analytics, and Hybrid cloud/on-premise deployments
Spark is fast and general engine for large-scale data processing which can solve all of your problems.
… Or can it?
This talk will cover real-world issues encountered during migration of the existing product to Spark infrastructure.
Aimed at software engineers that just started to evaluate Spark or those who are already using it.
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsMark Rittman
This is a session for Oracle DBAs and devs that looks at the cutting edge big data techs like Spark, Kafka etc, and through demos shows how Hadoop is now a a real-time platform for fast analytics, data integration and predictive modeling
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
From DataEngConf 2017 - Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible.
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
There are many options for providing SQL access over data in a Hadoop cluster, including proprietary vendor products such as Oracle Big Data SQL on the Oracle Big Data Appliance along with open-source technologies such as Apache Hive, Cloudera Impala and Apache Drill; customers are using those to provide reporting over their Hadoop and relational data platforms, and looking to add capabilities such as calculation engines, data integration and federation along with in-memory caching to create complete analytic platforms. In this session we'll look at the options that are available, compare database vendor solutions with their open-source alternative, and see how emerging vendors are going beyond simple SQL-on-Hadoop products to offer complete "data fabric" solutions that bring together old-world and new-world technologies and allow seamless offloading of archive data and compute work to lower-cost Hadoop platforms.
HBase can be an intimidating beast for someone considering its adoption. For what kinds of workloads is it well suited? How does it integrate into the rest of my application infrastructure? What are the data semantics upon which applications can be built? What are the deployment and operational concerns? In this talk, I'll address each of these questions in turn. As supporting evidence, both high-level application architecture and internal details will be discussed. This is an interactive talk: bring your questions and your use-cases!
Apache Avro and Messaging at Scale in LivePersonLivePerson
This talk covers the challenges we tackled during building our new service oriented system. Summarizing what we realized would bad Ideas to do, what are the better approaches to data consistency, how we used Apache Avro technology and what other supporting infrastructure we created to help us achieving the goal of consistent yet flexible system.
Amihay Zer-Kavod is I'm a Senior Software Architect at LivePerson.
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
At the StampedeCon 2015 Big Data Conference: Picking your distribution and platform is just the first decision of many you need to make in order to create a successful data ecosystem. In addition to things like replication factor and node configuration, the choice of file format can have a profound impact on cluster performance. Each of the data formats have different strengths and weaknesses, depending on how you want to store and retrieve your data. For instance, we have observed performance differences on the order of 25x between Parquet and Plain Text files for certain workloads. However, it isn’t the case that one is always better than the others.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
The Briefing Room with Neil Raden and Teradata
Live Webcast on August 19, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=1acd0b7ace309f765dc3196001d26a5e
Modern enterprises have been able to solve information management woes with the data warehouse, now a staple across the IT landscape that has evolved to a high level of sophistication and maturity with thousands of global implementations. Today’s modern enterprise has a similar challenge; big data and the fast evolution of the Hadoop ecosystem create plenty of new opportunities but also a significant number of operational pains as new solutions emerge.
Register for this episode of The Briefing Room to hear veteran Analyst Neil Raden as he explores the details and nature of Hadoop’s evolution. He’ll be briefed by Cesar Rojas of Teradata, who will share how Teradata solves some of the Hadoop operational challenges. He will also explain how the integration between Hadoop and the data warehouse can help organizations develop a more responsive and robust data management environment.
Visit InsideAnlaysis.com for more information.
Big Data is the reality of modern business: from big companies to small ones, everybody is trying to find their own benefit. Big Data technologies are not meant to replace traditional ones, but to be complementary to them. In this presentation you will hear what is Big Data and Data Lake and what are the most popular technologies used in Big Data world. We will also speak about Hadoop and Spark, and how they integrate with traditional systems and their benefits.
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationInside Analysis
The Briefing Room with Dr. Robin Bloor and Teradata
Live Webcast on May 20, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=f09e84f88e4ca6e0a9179c9a9e930b82
Traditional data warehouses have been the backbone of corporate decision making for over three decades. With the emergence of Big Data and popular technologies like open-source Apache™ Hadoop®, some analysts question the lifespan of the data warehouse and the future role it will play in enterprise information management. But it’s not practical to believe that emerging technologies provide a wholesale replacement of existing technologies and corporate investments in data management. Rather, a better approach is for new innovations and technologies to complement and build upon existing solutions.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor as he explains where tomorrow’s data warehouse fits in the information landscape. He’ll be briefed by Imad Birouty of Teradata, who will highlight the ways in which his company is evolving to meet the challenges presented by different types of data and applications. He will also tout Teradata’s recently-announced Teradata® Database 15 and Teradata® QueryGrid™, an analytics platform that enables data processing across the enterprise.
Visit InsideAnlaysis.com for more information.
5 Things that Make Hadoop a Game Changer
Webinar by Elliott Cordo, Caserta Concepts
There is much hype and mystery surrounding Hadoop's role in analytic architecture. In this webinar, Elliott presented, in detail, the services and concepts that makes Hadoop a truly unique solution - a game changer for the enterprise. He talked about the real benefits of a distributed file system, the multi workload processing capabilities enabled by YARN, and the 3 other important things you need to know about Hadoop.
To access the recorded webinar, visit the event site: https://www.brighttalk.com/webcast/9061/131029
For more information the services and solutions that Caserta Concepts offers, please visit http://casertaconcepts.com/
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
Enterprise Holding’s first started with Hadoop as a POC in 2013. Today, we have clusters on premises and in the cloud. This talk will explore our experience with Big Data and outline three common big data architectures (batch, lambda, and kappa). Then, we’ll dive into the decision points to necessary for your own cluster, for example: cloud vs on premises, physical vs virtual, workload, and security. These decisions will help you understand what direction to take. Finally, we’ll share some lessons learned with the pieces of our architecture worked well and rant about those which didn’t. No deep Hadoop knowledge is necessary, architect or executive level.
We Provide Hadoop training institute in Hyderabad and Bangalore with corporate training by 12+ Experience faculty.
Real-time industry experts from MNCs
Resume Preparation by expert Professionals
Lab exercises
Interview Preparation
Experts advice
AMIS organiseerde op maandagavond 15 juli het seminar ‘Oracle database 12c revealed’. Deze avond bood AMIS Oracle professionals de eerste mogelijkheid om de vernieuwingen in Oracle database 12c in actie te zien! De AMIS specialisten die meer dan een jaar bèta testen hebben uitgevoerd lieten zien wat er nieuw is en hoe we dat de komende jaren gaan inzetten!
Deze presentatie is deze avond gegeven als een plenaire sessie!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
2. 2 Copyright Teradata
• Data warehouse strengths
> What is a Data Warehouse?
• Hadoop strengths
• When to use which
> Hadoop
> Data warehouse
Agenda
3. 3 Copyright Teradata
Data Hub/Lake DataWarehouse Discovery
Three Primary Workloads
• Data models
• Data integration
• Trusted data
• Concurrent users
• Workload mgmt
• Response time
• Easy to use
• Many tools
• Algorithm collections
• Data wrangling
• Business user access
• Semi-production
• Fast raw data ingest
• Archival
• ETL refinery
• Search
• Relaxed SLAs
• Millions of files
6. 6 Copyright Teradata
• A data design pattern, an architecture
> Not necessarily a database
• Definition: Gartner (2005) /Inmon (1992)
> Subject oriented
– Detailed data + modeling of sales, inventory, finance, etc.
> Integrated logical model
– Merged data
– Consistent, standardized data formats and values
> Nonvolatile
– Data stored unmodified for long periods of time
> Time variant
– Record versioning or temporal services
> Persistent storage, not virtual, not federated
What is a Data Warehouse?
Source: Gartner: Of Data Warehouses, Operational Data Stores, Data Marts and Data 'Outhouses‘, Dec 2005;
Inmon, Building the Data Warehouse, 1992, Wiley and Sons
7. 7 Copyright Teradata
By Definition
Data
Warehouse
Hadoop
Subject oriented 5 0
Detailed data 5 5
Modeled by business subject 5 0
Integrated 5 0
Merged, deduplicated data 5 0
Standardized data formats and values 5 0
Nonvolatile storage 5 5
Time variant: record versions, temporal 5 0
Persistent storage 5 5
Data Warehouse Design Pattern
0=none, 1= poor, 2= limited, 3= average, 4=robust, 5=outstanding
8. 8 Copyright Teradata
NoSchema, Schema-on-Read, Complex Schemas
Single file
(Schema-on-read)
Data Marts
(Schema-on-read)
Data Warehouse
(Schema-on-
write)
No schema, no joins
One source
Raw data
3-5 uses
Star and snowflake
schemas
2-4 fact table joins
Multiple sources
Raw data, unknown
data
Key value stores
5K-10K tables
20-50 way joins
Cross-organization
Pre-integrated,
cleansed
Referential integrity
Many applications
Events
Locations
Finance Transaction
Session
Orders
InventoryCall
Center
POS
9. 9 Copyright Teradata
• Not a database
> No schema, indexes, optimizer
> No separation of code and data structure
> Hadoop uses objects and files
– Not rows and columns
• Hive helps a little
> Limited SQL
> Limited metadata
• Not high performance
• Not fully interactive queries
What Hadoop is Not
See also http://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html
http://blogs.gartner.com/donald-feinberg/2014/12/22/a-database-by-any-other-name/
10. 10 Copyright Teradata
• Guarantees database actions
are processed reliably
• Ensures query result accuracy
• Supports updates and deletes
• Needed for applications that
require 100% consistency
> Banks, finance, inventory, etc.
> Maybe not for Facebook,
Twitter, etc.
• Data you can trust
ACID Advantages of an RDBMS
Atomicity
apply all changes or none
Consistency
rollback on errors
Isolation
one update at a time
Durability
transactions survive crashes
11. 11 Copyright Teradata
Integration and Analytics
Hadoop’s Biggest Differentiators
Capture and
ETL
Long term
archive
Cheap, commodity hardware
Data
Warehouse
13. 13 Copyright Teradata
When We’re Too Small for Hadoop ETL
Avoid hand coded transforms
2 ETL servers do the job
Prefer tool based ETL
ETL is working well
14. 14 Copyright Teradata
When We Need Massive Data Integration
Dozens of ETL servers
High velocity real time data
10s-100s of TB/day
The risk is worth reward
15. 15 Copyright Teradata
When In-database ELT Works Well
Reference data look-ups
Joins for derived data
Lots of derived data
Service-level goals to meet
16. 16 Copyright Teradata
When to Use Which: It Depends
In Database ELT Hadoop
Reference data
• Lookups
• Joins
Transformations
• Structured data
• ELT modules
• SQL can do it
• Unstructured
• Some ETL modules
• Do it yourself
Service level
goals
• Predictable
• System management
Data security • Robust
Costs • Commodity hardware
Data quality • Governance, MDM • Low quality/trust OK
Data volume • High volume • Extreme volume
Offload ELT • Migration costs
Agility • No governance
18. 18 Copyright Teradata
• Commodity low cost hardware
• Many programing languages
> But mostly it’s Java
• Free open source
• Any data structure
• Scale-out to petabytes + parallelism
Hadoop Strengths
19. 19 Copyright Teradata
• ETL on steroids
• Economically ”keep files forever”
> Queryable
• File based reporting and analytics
• Backup and archival storage
> Databases, files, development
Hadoop: the Data Hub
20. 20 Copyright Teradata
• Temporary data, data exhaust
• Data mining/exploration
> 1000s of continuous variables
> Linear algebra
> Graph mining
> Machine learning
> Random forest, decision trees
> Markov chains
• Not all data mining MapReduce
> Many things work better in MPP RDBMS
> In-database SAS, R, Fuzzy Logix
> It depends
Where MapReduce Excels
21. 21 Copyright Teradata
• Easy to work on non-relational data
> Java data types
> JSON, objects
• Hadoop is written in the Java
> Compatible APIs, skills, concepts, frameworks, scripts
• Huge open source factories
> Apache, GitHub, Eclipse, SourceForge,etc.
> Assorted compression algorithms
• People
> 9M-10M java programmers
> Web tutorials – extensive “how to” topics
> University student research
Developer Advantages with Hadoop
22. 22 Copyright Teradata
• Raw data format provides complete flexibility
• Non-traditional data types easily supported
> Graph, text, weblog, etc.
• No upfront ETL required
• No data loading required
• Flexible: late binding let’s data scientist choose
NoSchema Advantages
41521390 2013-01-01 00:25:42 2.111.94.18
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_5; en-
us) AppleWebKit/533.19.4 (KHTML, like Gecko)
Version/5.0.3 Safari/533.19.4
"http://www.cokstate.edu/welcome/"
"https://www.google.com/#sclient=psyab&hl=en&sourc
e=hp&q=oklahoma+state&pbx=1&oq”
Weblog
Note: there are many pitfalls when schema-on-read is not a good solution
23. 23 Copyright Teradata
Attributes Favoring Hadoop
Reason Description
Cost Low cost, low value data before refinement
Multi structured
data ingest
Raw weblogs, Twitter, Facebook, mobile,
PST files, etc.
Data depth
High data volume, few users, high signal-to-
noise ratio
Non-SQL analytics
Complex processes, pipeline transforms,
random forests, Markov chains, enormous
arrays, etc.
Flexibility,
autonomy
Exploratory analysis with little governance
Fast, short-term turn around
Ugly data
Videos, satellite images, format conversions
(PDF to text)
24. 24 Copyright Teradata
MPP RDBMS Hadoop
Stable schema Evolving schema
Structured data Structure agnostic
Full ANSI SQL Flexible programming
Iterative analysis Batch analysis
Fine grain security N/A
Cleansed data Raw data
Seeks Scans
Updates/deletes Ingest
Service level agreements Flexibility
Core data Source files
Complex joins Complex processing
Efficient CPU and IO Low cost storage
Key Considerations
25. 25 Copyright Teradata
• YARN and Tez
• Queries on flat files!
• Parallel scanning engine
• Developer community
• Complex parallel processing
• Fast ingest of raw data
• Long term archives at full fidelity
• Good scalability
What I Like About Hadoop
26. 26 Copyright Teradata
• Start with workload requirements
> Map the tool capabilities to the requirement
• Hadoop is a DataHub, a Data Lake
> Not a database or data warehouse
> Exploit Hadoop’s strengths
• Combine the data warehouse and Hadoop
> Two tool sets solve more objectives
> Better together
Summary