Speaker: Marcel Kornacker
As data is ingested into Apache Hadoop at an increasing rate from a diverse range of data sources, it is becoming more and more important for users that new data be accessible for analysis as quickly as possible—because “data freshness” can have a direct impact on business results.
In the traditional ETL process, raw data is transformed from the source into a target schema, possibly requiring flattening and condensing, and then loaded into an MPP DBMS. However, this approach has multiple drawbacks that make it unsuitable for real-time, “at-source” analytics—for example, the “ETL lag” reduces data freshness, and the inherent complexity of the process makes it costly to deploy and maintain, and reduces the speed at which new analytic applications can be introduced.
In this talk, attendees will learn about Impala’s approach to on-the-fly, automatic data transformation, which in conjunction with the ability to handle nested structures such as JSON and XML documents, addresses the needs of at-source analytics—including direct querying of your input schema, immediate querying of data as it lands in HDFS, and high performance on par with specialized engines. This performance level is attained in spite of the most challenging and diverse input formats, which are addressed through an automated background conversion process into Parquet, the high-performance, open source columnar format that has been widely adopted across the Hadoop ecosystem.
In this talk, attendees will learn about Impala’s upcoming features that will enable at-source analytics: support for nested structures such as JSON and XML documents, which allows direct querying of the source schema; automated background file format conversion into Parquet, the high-performance, open source columnar format that has been widely adopted across the Hadoop ecosystem; and automated creation of declaratively-specified derived data for simplified data cleansing and transformation.
Slides for presentation on Cloudera Impala I gave at the DC/NOVA Java Users Group on 7/9/2013. It is a slightly updated set of slides from the ones I uploaded a few months ago on 4/19/2013. It covers version 1.0.1 and also includes some new slides on HortonWorks' Stinger Initiative.
An Introduction to Cloudera Impala, shows how Impala works, and the internal processing of query of Impala, including architecture, frontend, query compilation, backend, code generation, HDFS-related stuff and performance comparison.
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
This is a technical deep dive about Cloudera Impala, the project that makes scalable parallel databse technology available to the Hadoop community for the first time. Impala is an open-sourced code base that allows users to issue low-latency queries to data stored in HDFS and Apache HBase using familiar SQL operators.
Presenter Marcel Kornacker, creator of Impala, begins with an overview of Impala from the user's perspective, followed by an overview of Impala's architecture and implementation, and will conclude with a comparison of Impala with Dremel and Apache Hive, commercial MapReduce alternatives and traditional data warehouse infrastructure.
Impala Architecture Presentation at Toronto Hadoop User Group, in January 2014 by Mark Grover.
Event details:
http://www.meetup.com/TorontoHUG/events/150328602/
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
Presentations from the Cloudera Impala meetup on Aug 20 2013:
- Nong Li on Parquet+Impala and UDF support
- Henry Robinson on performance tuning for Impala
Slides for presentation on Cloudera Impala I gave at the DC/NOVA Java Users Group on 7/9/2013. It is a slightly updated set of slides from the ones I uploaded a few months ago on 4/19/2013. It covers version 1.0.1 and also includes some new slides on HortonWorks' Stinger Initiative.
An Introduction to Cloudera Impala, shows how Impala works, and the internal processing of query of Impala, including architecture, frontend, query compilation, backend, code generation, HDFS-related stuff and performance comparison.
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
This is a technical deep dive about Cloudera Impala, the project that makes scalable parallel databse technology available to the Hadoop community for the first time. Impala is an open-sourced code base that allows users to issue low-latency queries to data stored in HDFS and Apache HBase using familiar SQL operators.
Presenter Marcel Kornacker, creator of Impala, begins with an overview of Impala from the user's perspective, followed by an overview of Impala's architecture and implementation, and will conclude with a comparison of Impala with Dremel and Apache Hive, commercial MapReduce alternatives and traditional data warehouse infrastructure.
Impala Architecture Presentation at Toronto Hadoop User Group, in January 2014 by Mark Grover.
Event details:
http://www.meetup.com/TorontoHUG/events/150328602/
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
Presentations from the Cloudera Impala meetup on Aug 20 2013:
- Nong Li on Parquet+Impala and UDF support
- Henry Robinson on performance tuning for Impala
Real-time Big Data Analytics Engine using ImpalaJason Shih
Cloudera Impala is an open-source under Apache Licence enable real-time, interactive analytical SQL queries of the data stored in HBase or HDFS. The work was inspired by Google Dremel paper which is also the basis for Google BigQuery. It provide access same unified storage platform base on it's own distributed query engine but does not use mapreduce. In addition, it use also the same metadata, SQL syntax (HiveQL-like) ODBC driver and user interface (Hue Beeswax) as Hive. Besides the traditional Hadoop approach, aim to provide low-cost solution for resiliency and batch-oriented distributed data processing, we found more and more effort in the Big Data world pursuing the right solution for ad-hoc, fast queries and realtime data processing for large datasets. In this presentation, we'll explore how to run interactive queries inside Impala, advantages of the approach, architecture and understand how it optimizes data systems including also practical performance analysis.
James Kinley from Cloudera:
An introduction to Cloudera Impala. Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver and user interface (Hue Beeswax) as Apache Hive. This provides a familiar and unified platform for batch-oriented or real-time queries.
The link to the video: http://zurichtechtalks.ch/post/37339409724/an-introduction-to-cloudera-impala-sql-on-top-of
This talk was held at the 11th meeting on April 7 2014 by Marcel Kornacker.
Impala (impala.io) raises the bar for SQL query performance on Apache Hadoop. With Impala, you can query Hadoop data – including SELECT, JOIN, and aggregate functions – in real time to do BI-style analysis. As a result, Impala makes a Hadoop-based enterprise data hub function like an enterprise data warehouse for native Big Data.
In this presentation, Vineet will be explaining case study of one of my customers using Spark to migrate terabytes of data from GPFS into Hive tables. The ETL pipeline was built purely using Spark. The pipeline extracted target (Hive) table properties such as - identification of Hive Date/Timestamp columns, whether target table is partitioned or non-partitioned, target storage formats (Parquet or Avro) and source to target columns mappings. These target tables contain few to hundreds of columns and non standard date fomats into Hive standard timestamp format.
Cloudera Impala: The Open Source, Distributed SQL Query Engine for Big Data. The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Apache Hadoop cluster. With Impala, the Hadoop ecosystem now has an open-source codebase that helps users query data stored in Hadoop-based enterprise data hubs in real time, using familiar SQL syntax.
This talk will begin with an overview of the challenges organizations face as they collect and process more data than ever before, followed by an overview of Impala from the user's perspective and a dive into Impala's architecture. It concludes with stories of how Cloudera's customers are using Impala and the benefits they see.
Learn how Cloudera Impala empowers you to:
- Perform interactive, real-time analysis directly on source data stored in Hadoop
- Interact with data in HDFS and HBase at the “speed of thought”
- Reduce data movement between systems & eliminate double storage
Interested in learning Hadoop, but you’re overwhelmed by the number of components in the Hadoop ecosystem? You’d like to get some hands on experience with Hadoop but you don’t know Linux or Java? This session will focus on giving a high level explanation of Hive and HiveQL and how you can use them to get started with Hadoop without knowing Linux or Java.
End-to-end Data Governance with Apache Avro and AtlasDataWorks Summit
Aeolus is Comcast’s new internal Big Data system for providing access to an integrated view of a wide variety of high-quality, near-real-time and batch data. Such integration can enable data scientists to uncover otherwise hidden trends, anomalies, and powerful predictors of business successes and failures. But integrating data across silos in a large enterprise is fraught with peril. There typically are few standards on naming conventions and data representation, and spotty documentation at best. The old rule of thumb often applies: 70% of the analysts’ time goes into data wrangling, while only 30% goes toward the actual analyses and simulations. The goal of the Athene Data Governance Platform within Aeolus is to invert this ratio. This talk will explain how Comcast is using Apache Avro and Atlas for end-to-end data governance, the challenges faced, and methods used to address these challenges.
Avro provides a lingua franca for data representation, data integration, and schema evolution. All data published for community consumption must have an associated avro schema in Atlas. Every step in its journey through Aeolus, in flight or at rest, is captured in Atlas. Atlas’ extensibility has allowed us to add or update various entity types (e.g., avro schemas, kafka topics, object store pseudo-directories) and lineage types (e.g., storing streaming data in object storage; embellishing and re-publishing streaming data; performing aggregations and other transformations on data at rest; and evolution of schemas with compatibility flags). Transformation services notify Atlas of lineage links via custom asynchronous kafka messaging.
Atlas provides self-service data discovery and lineage browsing and querying, via full-text search, DSL query language, or gremlin graph query language. Example queries: “Where is data from kafka topic X stored?” “Display the journey of data currently stored in pseudo-directory X since it entered the Aeolus system”. “Show me all earlier versions of schema S, and whether they are forward/backward compatible with each other.”
Real-time Big Data Analytics Engine using ImpalaJason Shih
Cloudera Impala is an open-source under Apache Licence enable real-time, interactive analytical SQL queries of the data stored in HBase or HDFS. The work was inspired by Google Dremel paper which is also the basis for Google BigQuery. It provide access same unified storage platform base on it's own distributed query engine but does not use mapreduce. In addition, it use also the same metadata, SQL syntax (HiveQL-like) ODBC driver and user interface (Hue Beeswax) as Hive. Besides the traditional Hadoop approach, aim to provide low-cost solution for resiliency and batch-oriented distributed data processing, we found more and more effort in the Big Data world pursuing the right solution for ad-hoc, fast queries and realtime data processing for large datasets. In this presentation, we'll explore how to run interactive queries inside Impala, advantages of the approach, architecture and understand how it optimizes data systems including also practical performance analysis.
James Kinley from Cloudera:
An introduction to Cloudera Impala. Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver and user interface (Hue Beeswax) as Apache Hive. This provides a familiar and unified platform for batch-oriented or real-time queries.
The link to the video: http://zurichtechtalks.ch/post/37339409724/an-introduction-to-cloudera-impala-sql-on-top-of
This talk was held at the 11th meeting on April 7 2014 by Marcel Kornacker.
Impala (impala.io) raises the bar for SQL query performance on Apache Hadoop. With Impala, you can query Hadoop data – including SELECT, JOIN, and aggregate functions – in real time to do BI-style analysis. As a result, Impala makes a Hadoop-based enterprise data hub function like an enterprise data warehouse for native Big Data.
In this presentation, Vineet will be explaining case study of one of my customers using Spark to migrate terabytes of data from GPFS into Hive tables. The ETL pipeline was built purely using Spark. The pipeline extracted target (Hive) table properties such as - identification of Hive Date/Timestamp columns, whether target table is partitioned or non-partitioned, target storage formats (Parquet or Avro) and source to target columns mappings. These target tables contain few to hundreds of columns and non standard date fomats into Hive standard timestamp format.
Cloudera Impala: The Open Source, Distributed SQL Query Engine for Big Data. The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Apache Hadoop cluster. With Impala, the Hadoop ecosystem now has an open-source codebase that helps users query data stored in Hadoop-based enterprise data hubs in real time, using familiar SQL syntax.
This talk will begin with an overview of the challenges organizations face as they collect and process more data than ever before, followed by an overview of Impala from the user's perspective and a dive into Impala's architecture. It concludes with stories of how Cloudera's customers are using Impala and the benefits they see.
Learn how Cloudera Impala empowers you to:
- Perform interactive, real-time analysis directly on source data stored in Hadoop
- Interact with data in HDFS and HBase at the “speed of thought”
- Reduce data movement between systems & eliminate double storage
Interested in learning Hadoop, but you’re overwhelmed by the number of components in the Hadoop ecosystem? You’d like to get some hands on experience with Hadoop but you don’t know Linux or Java? This session will focus on giving a high level explanation of Hive and HiveQL and how you can use them to get started with Hadoop without knowing Linux or Java.
End-to-end Data Governance with Apache Avro and AtlasDataWorks Summit
Aeolus is Comcast’s new internal Big Data system for providing access to an integrated view of a wide variety of high-quality, near-real-time and batch data. Such integration can enable data scientists to uncover otherwise hidden trends, anomalies, and powerful predictors of business successes and failures. But integrating data across silos in a large enterprise is fraught with peril. There typically are few standards on naming conventions and data representation, and spotty documentation at best. The old rule of thumb often applies: 70% of the analysts’ time goes into data wrangling, while only 30% goes toward the actual analyses and simulations. The goal of the Athene Data Governance Platform within Aeolus is to invert this ratio. This talk will explain how Comcast is using Apache Avro and Atlas for end-to-end data governance, the challenges faced, and methods used to address these challenges.
Avro provides a lingua franca for data representation, data integration, and schema evolution. All data published for community consumption must have an associated avro schema in Atlas. Every step in its journey through Aeolus, in flight or at rest, is captured in Atlas. Atlas’ extensibility has allowed us to add or update various entity types (e.g., avro schemas, kafka topics, object store pseudo-directories) and lineage types (e.g., storing streaming data in object storage; embellishing and re-publishing streaming data; performing aggregations and other transformations on data at rest; and evolution of schemas with compatibility flags). Transformation services notify Atlas of lineage links via custom asynchronous kafka messaging.
Atlas provides self-service data discovery and lineage browsing and querying, via full-text search, DSL query language, or gremlin graph query language. Example queries: “Where is data from kafka topic X stored?” “Display the journey of data currently stored in pseudo-directory X since it entered the Aeolus system”. “Show me all earlier versions of schema S, and whether they are forward/backward compatible with each other.”
We present our solution for building an AI Architecture that provides engineering teams the ability to leverage data to drive insight and help our customers solve their problems. We started with siloed data, entities that were described differently by each product, different formats, complicated security and access schemes, data spread over numerous locations and systems.
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Lucas Jellema
Dozens of Oracle Databases - each health center location has one on its local server with the same data model and the same set of applications. These databases have to be centralized and cloudified and also be consolidated into one or as few databases as possible. To lower costs, ease operations and enable innovation. Each location can access only its own data, applications do not have to be changed and different locations can run different versions of applications and database objects. This is the story of a critical migration. About the cloud ready analysis, the Proofs of Concept with Oracle Database features VPD and Edition Based Redefinition, the scalability investigation, the redesign of change management, rollout and operational management processes and the careful modernization of a 25 year old platform on the latest database release and a shiny new, fully automated cloud platform.This is the story of an organization that had state of the art systems in the mid-90s. And they have these same systems today - no longer state of the art. They can keep the systems alive, but barely, and at increasing cost. In the Fall of 2020, we started an investigation into the feasibility of bringing the 100s of databases from each of the locations together, in a central location, in the cloud and finally: consolidated into one or at least as few database instances as possible. Using Oracle Database Virtual Private Database and Edition Based Redefinition, a smart database connection configuration in each site and a limited reimplementation of non-cloud/non-consolidated mechanisms (interaction with local file system for example) we have designed and proven a working new design and migration approach.
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
Element Fleet has the largest benchmark database in our industry and we needed a robust and linearly scalable platform to turn this data into actionable insights for our customers. The platform needed to support advanced analytics, streaming data sets, and traditional business intelligence use cases.
In this presentation, we will discuss how we built a single, unified platform for both Advanced Analytics and traditional Business Intelligence using Cassandra on DSE. With Cassandra as our foundation, we are able to plug in the appropriate technology to meet varied use cases. The platform we’ve built supports real-time streaming (Spark Streaming/Kafka), batch and streaming analytics (PySpark, Spark Streaming), and traditional BI/data warehousing (C*/FiloDB). In this talk, we are going to explore the entire tech stack and the challenges we faced trying support the above use cases. We will specifically discuss how we ingest and analyze IoT (vehicle telematics data) in real-time and batch, combine data from multiple data sources into to single data model, and support standardized and ah-hoc reporting requirements.
About the Speaker
Jim Peregord Vice President - Analytics, Business Intelligence, Data Management, Element Corp.
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
Speaker: Adam Warrington (Cloudera)
The Kite SDK is a set of libraries and tools focused on making it easier to build systems on top of the Hadoop ecosystem. HBase support has recently been added to the Kite SDK Data Module, which allows a developer to model and access data in HBase consistent with how they would model data in HDFS using Kite. This talk will focus on Kite's HBase support by covering Kite basics and moving through the specifics of working with HBase as a data source. This feature overview will be supplemented by specifics of how that feature is being used in production applications at Cloudera.
Apache Hive is a rapidly evolving project which continues to enjoy great adoption in the big data ecosystem. As Hive continues to grow its support for analytics, reporting, and interactive query, the community is hard at work in improving it along with many different dimensions and use cases. This talk will provide an overview of the latest and greatest features and optimizations which have landed in the project over the last year. Materialized views, the extension of ACID semantics to non-ORC data, and workload management are some noteworthy new features.
We will discuss optimizations which provide major performance gains as well as integration with other big data technologies such as Apache Spark, Druid, and Kafka. The talk will also provide a glimpse of what is expected to come in the near future.
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
Data Lakes have become a new tool in building modern data warehouse architectures. In this presentation we will introduce Microsoft's Azure Data Lake offering and its new big data processing language called U-SQL that makes Big Data Processing easy by combining the declarativity of SQL with the extensibility of C#. We will give you an initial introduction to U-SQL by explaining why we introduced U-SQL and showing with an example of how to analyze some tweet data with U-SQL and its extensibility capabilities and take you on an introductory tour of U-SQL that is geared towards existing SQL users.
slides for SQL Saturday 635, Vancouver BC, Aug 2017
My talk from Database Camp 2016 at the United Nations. I focus on how we can bridge the gap between OLTP and OLAP workloads and discuss a very promising new technology called Apache Kudu.
You've seen the basic 2-stage example Spark Programs, and now you're ready to move on to something larger. I'll go over lessons I've learned for writing efficient Spark programs, from design patterns to debugging tips.
The slides are largely just talking points for a live presentation, but hopefully you can still make sense of them for offline viewing as well.
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...DataStax
Cassandra is moving away from Thrift to CQL protocol. Along with this change, Thrift based client drivers are not actively supported any more, nor are they exposed to new Cassandra features. This brings many existing Cassandra users into a situation that they need to migrate their Thrift based application to CQL based. A complete solution to Thrift-to-CQL migration requires changes in 3 areas: application code, data model, and existing data. In this session we will focus on how we can migrate existing data effectively in order to reflect data model changes and give you the necessary tools to build from what you have learned to apply it in your environments.
Learning Objectives:
1) Gain an insight of what Cassandra storage engine looks like for version 2.2 and downward
2) Get better idea of how Thrift and CQL difference affects table design
3) Explore an approach to effectively migrate dynamically generated (Thrift) data into a static defined (CQL) table
About the Speaker
Yabin Meng Apache Cassandra / DataStax Enterprise Consultant, Pythian
Yabin is a DataStax certified Architect, Administrator, and Developer. He has been in IT industry for more than 15 years and much of his career is around database related technologies. He has been working with Cassandra for about 2 years and is currently a Cassandra/DSE consultant at Pythian.
Similar to Friction-free ETL: Automating data transformation with Impala | Strata + Hadoop World London 2015 (20)
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
This annual program recognizes organizations who are moving swiftly towards the future and building innovative solutions by making what was impossible yesterday, possible today.
The winning organizations' implementations demonstrate outstanding achievements in fulfilling their mission, technical advancement, and overall impact.
The 2021 Data Impact Awards recognize organizations' achievements with the Cloudera Data Platform in seven categories:
Data Lifecycle Connection
Data for Enterprise AI
Cloud Innovation
Security & Governance Leadership
People First
Data for Good
Industry Transformation
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
Cloudera is proud to present the 2020 Data Impact Awards Finalists. This annual program recognizes organizations running the Cloudera platform for the applications they've built and the impact their data projects have on their organizations, their industries, and the world. Nominations were evaluated by a panel of independent thought-leaders and expert industry analysts, who then selected the finalists and winners. Winners exemplify the most-cutting edge data projects and represent innovation and leadership in their respective industries.
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
Cloudera Fast Forward Labs’ latest research report and prototype explore learning with limited labeled data. This capability relaxes the stringent labeled data requirement in supervised machine learning and opens up new product possibilities. It is industry invariant, addresses the labeling pain point and enables applications to be built faster and more efficiently.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
In this session, we will cover how to move beyond structured, curated reports based on known questions on known data, to an ad-hoc exploration of all data to optimize business processes and into the unknown questions on unknown data, where machine learning and statistically motivated predictive analytics are shaping business strategy.
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
Watch this webinar to understand how Hortonworks DataFlow (HDF) has evolved into the new Cloudera DataFlow (CDF). Learn about key capabilities that CDF delivers such as -
-Powerful data ingestion powered by Apache NiFi
-Edge data collection by Apache MiNiFi
-IoT-scale streaming data processing with Apache Kafka
-Enterprise services to offer unified security and governance from edge-to-enterprise
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
Cloudera’s Data Science Workbench (CDSW) is available for Hortonworks Data Platform (HDP) clusters for secure, collaborative data science at scale. During this webinar, we provide an introductory tour of CDSW and a demonstration of a machine learning workflow using CDSW on HDP.
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
Join Cloudera as we outline how we use Cloudera technology to strengthen sales engagement, minimize marketing waste, and empower line of business leaders to drive successful outcomes.
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
Learn how organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern big data architecture powered by Cloudera on Azure. In this webinar, you see how fast and easy it is to deploy a modern data management platform—in your cloud, on your terms.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
Join us to learn about the challenges of legacy data warehousing, the goals of modern data warehousing, and the design patterns and frameworks that help to accelerate modernization efforts.
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
Learn how organizations are deriving unique customer insights, improving product and services efficiency, and reducing business risk with a modern big data architecture powered by Cloudera on AWS. In this webinar, you see how fast and easy it is to deploy a modern data management platform—in your cloud, on your terms.
Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.
Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.
Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
Cloudera SDX is by no means no restricted to just the platform; it extends well beyond. In this webinar, we show you how Bardess Group’s Zero2Hero solution leverages the shared data experience to coordinate Cloudera, Trifacta, and Qlik to deliver complete customer insight.
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
Join Cloudera Fast Forward Labs Research Engineer, Mike Lee Williams, to hear about their latest research report and prototype on Federated Learning. Learn more about what it is, when it’s applicable, how it works, and the current landscape of tools and libraries.
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
451 Research Analyst Sheryl Kingstone, and Cloudera’s Steve Totman recently discussed how a growing number of organizations are replacing legacy Customer 360 systems with Customer Insights Platforms.
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
In this webinar, you will learn how Cloudera and BAH riskCanvas can help you build a modern AML platform that reduces false positive rates, investigation costs, technology sprawl, and regulatory risk.
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
How can companies integrate data science into their businesses more effectively? Watch this recorded webinar and demonstration to hear more about operationalizing data science with Cloudera Data Science Workbench on Cazena’s fully-managed cloud platform.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.