This document summarizes Cosmin Lehene's presentation on Big Data with HBase and Hadoop at Adobe. The presentation discusses how Adobe uses Hadoop and HBase to analyze large amounts of data from sources like video logs, Flash usage logs, and image metadata. It provides examples of how Adobe uses this analysis to improve products like the Adobe Media Player and Photoshop and gain business intelligence. The presentation also covers topics like HBase data modeling, MapReduce workflows, and scaling challenges encountered by Adobe.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
LinkedIn leverages the Apache Hadoop ecosystem for its big data analytics. Steady growth of the member base at LinkedIn along with their social activities results in exponential growth of the analytics infrastructure. Innovations in analytics tooling lead to heavier workloads on the clusters, which generate more data, which in turn encourage innovations in tooling and more workloads. Thus, the infrastructure remains under constant growth pressure. Heterogeneous environments embodied via a variety of hardware and diverse workloads make the task even more challenging.
This talk will tell the story of how we doubled our Hadoop infrastructure twice in the past two years.
• We will outline our main use cases and historical rates of cluster growth in multiple dimensions.
• We will focus on optimizations, configuration improvements, performance monitoring and architectural decisions we undertook to allow the infrastructure to keep pace with business needs.
• The topics include improvements in HDFS NameNode performance, and fine tuning of block report processing, the block balancer, and the namespace checkpointer.
• We will reveal a study on the optimal storage device for HDFS persistent journals (SATA vs. SAS vs. SSD vs. RAID).
• We will also describe Satellite Cluster project which allowed us to double the objects stored on one logical cluster by splitting an HDFS cluster into two partitions without the use of federation and practically no code changes.
• Finally, we will take a peek at our future goals, requirements, and growth perspectives.
SPEAKERS
Konstantin Shvachko, Sr Staff Software Engineer, LinkedIn
Erik Krogen, Senior Software Engineer, LinkedIn
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
The analytics platform at Twitter has experienced tremendous growth over the past few years in terms of size, complexity, number of users, and variety of use cases. In this talk, we’ll discuss the evolution of our infrastructure and the development of capabilities for data mining on “big data”. We’ll share our experiences as a case study, but make recommendations for best practices and point out opportunities for future work.
Finite state automata and transducers made it into Lucene fairly recently, but already show a very promising impact on search performance. This data structure is rarely exploited because it is commonly (and unfairly) associated with high complexity. During the talk, I will try to show that automata and transducers are in fact very simple, their construction can be very efficient (memory and time-wise) and their field of applications very broad.
How to boost your datamanagement with Dremio ?Vincent Terrasi
Works with any source. Relational, non-relational, 3rd party apps. 5 years ago nobody was using Hadoop, MongoDB, and 5 years from now there will be new products. You need a solution that is future proof.
Works with any BI tool. In every company multiple tools are in use. Each department has their favorite. We need to work with all of them.
No ETL, data warehouse, cubes. This would need to give you a really good alternative to these options.
Makes data self-service, collaborative. Probably most important of all, we need to change the dynamic between the business and IT. We need to make it so business users can get the data they want, in the shape they want it, without waiting on IT.
Makes Big Data feels small. It needs to make billions of rows feel like a spreadsheet on your desktop.
Open source. It’s 2017, so we think this has to be open source.
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
Flink Forward San Francisco 2022.
At Bloomberg, we deal with high volumes of real-time market data. Our clients expect to be notified of any anomalies in this market data, which may indicate volatile movements in the markets, notable trades, forthcoming events, or system failures. The parameters for these alerts are always evolving and our clients can update them dynamically. In this talk, we'll cover how we utilized the open source Apache Flink and Siddhi SQL projects to build a distributed, scalable, low-latency and dynamic rule-based, real-time alerting system to solve our clients' needs. We'll also cover the lessons we learned along our journey.
by
Ajay Vyasapeetam & Madhuri Jain
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
LinkedIn leverages the Apache Hadoop ecosystem for its big data analytics. Steady growth of the member base at LinkedIn along with their social activities results in exponential growth of the analytics infrastructure. Innovations in analytics tooling lead to heavier workloads on the clusters, which generate more data, which in turn encourage innovations in tooling and more workloads. Thus, the infrastructure remains under constant growth pressure. Heterogeneous environments embodied via a variety of hardware and diverse workloads make the task even more challenging.
This talk will tell the story of how we doubled our Hadoop infrastructure twice in the past two years.
• We will outline our main use cases and historical rates of cluster growth in multiple dimensions.
• We will focus on optimizations, configuration improvements, performance monitoring and architectural decisions we undertook to allow the infrastructure to keep pace with business needs.
• The topics include improvements in HDFS NameNode performance, and fine tuning of block report processing, the block balancer, and the namespace checkpointer.
• We will reveal a study on the optimal storage device for HDFS persistent journals (SATA vs. SAS vs. SSD vs. RAID).
• We will also describe Satellite Cluster project which allowed us to double the objects stored on one logical cluster by splitting an HDFS cluster into two partitions without the use of federation and practically no code changes.
• Finally, we will take a peek at our future goals, requirements, and growth perspectives.
SPEAKERS
Konstantin Shvachko, Sr Staff Software Engineer, LinkedIn
Erik Krogen, Senior Software Engineer, LinkedIn
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
The analytics platform at Twitter has experienced tremendous growth over the past few years in terms of size, complexity, number of users, and variety of use cases. In this talk, we’ll discuss the evolution of our infrastructure and the development of capabilities for data mining on “big data”. We’ll share our experiences as a case study, but make recommendations for best practices and point out opportunities for future work.
Finite state automata and transducers made it into Lucene fairly recently, but already show a very promising impact on search performance. This data structure is rarely exploited because it is commonly (and unfairly) associated with high complexity. During the talk, I will try to show that automata and transducers are in fact very simple, their construction can be very efficient (memory and time-wise) and their field of applications very broad.
How to boost your datamanagement with Dremio ?Vincent Terrasi
Works with any source. Relational, non-relational, 3rd party apps. 5 years ago nobody was using Hadoop, MongoDB, and 5 years from now there will be new products. You need a solution that is future proof.
Works with any BI tool. In every company multiple tools are in use. Each department has their favorite. We need to work with all of them.
No ETL, data warehouse, cubes. This would need to give you a really good alternative to these options.
Makes data self-service, collaborative. Probably most important of all, we need to change the dynamic between the business and IT. We need to make it so business users can get the data they want, in the shape they want it, without waiting on IT.
Makes Big Data feels small. It needs to make billions of rows feel like a spreadsheet on your desktop.
Open source. It’s 2017, so we think this has to be open source.
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
Flink Forward San Francisco 2022.
At Bloomberg, we deal with high volumes of real-time market data. Our clients expect to be notified of any anomalies in this market data, which may indicate volatile movements in the markets, notable trades, forthcoming events, or system failures. The parameters for these alerts are always evolving and our clients can update them dynamically. In this talk, we'll cover how we utilized the open source Apache Flink and Siddhi SQL projects to build a distributed, scalable, low-latency and dynamic rule-based, real-time alerting system to solve our clients' needs. We'll also cover the lessons we learned along our journey.
by
Ajay Vyasapeetam & Madhuri Jain
Netflix’s architecture involves thousands of microservices built to serve unique business needs. As this architecture grew, it became clear that the data storage and query needs were unique to each area; there is no one silver bullet which fits the data needs for all microservices. CDE (Cloud Database Engineering team) offers polyglot persistence, which promises to offer ideal matches between problem spaces and persistence solutions. In this meetup you will get a deep dive into the Self service platform, our solution to repairing Cassandra data reliably across different datacenters, Memcached Flash and cross region replication and Graph database evolution at Netflix.
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
3 Things to Learn About:
*How Apache Kudu enables users to do more than ever before with their Analytic and Operational Databases
*How Cloudera has built two versatile databases to help our customers tackle their hardest problems.
*How the addition of Apache Kudu to this mix will enable new use cases around real-time analytics, internet of things, time series data, and more.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
Nowadays, people are creating, sharing and storing data at a faster pace than ever before, effective data compression / decompression could significantly reduce the cost of data usage. Apache Spark is a general distributed computing engine for big data analytics, and it has large amount of data storing and shuffling across cluster in runtime, the data compression/decompression codecs can impact the end to end application performance in many ways.
However, there’s a trade-off between the storage size and compression/decompression throughput (CPU computation). Balancing the data compress speed and ratio is a very interesting topic, particularly while both software algorithms and the CPU instruction set keep evolving. Apache Spark provides a very flexible compression codecs interface with default implementations like GZip, Snappy, LZ4, ZSTD etc. and Intel Big Data Technologies team also implemented more codecs based on latest Intel platform like ISA-L(igzip), LZ4-IPP, Zlib-IPP and ZSTD for Apache Spark; in this session, we’d like to compare the characteristics of those algorithms and implementations, by running different micro workloads as well as end to end workloads, based on different generations of Intel x86 platform and disk.
It’s supposedly to be the best practice for big data software engineers to choose the proper compression/decompression codecs for their applications, and we also will present the methodologies of measuring and tuning the performance bottlenecks for typical Apache Spark workloads.
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder.
Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Samza) and in-house technologies have helped Uber scale.
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
Learn how data in motion is secure within Apache Kafka and the broader Confluent Platform, while data at rest can be secured by solutions like Vormetric Data Security Manager.
"Wire Encryption In HDFS: Protect Your Data From Others, Not Yourself"
ApacheCon 2019, Las Vegas.
SPEAKERS: Chen Liang, Konstantin Shvachko. LinkedIn
Wire data encryption is a key component of the Hadoop Distributed File System (HDFS). HDFS can enforce different levels of data protection, allowing users to specify one based on their own needs. However, such enforcement comes in as an all-or-nothing feature. Namely, wire encryption is enforced either for all accesses or none. Since encryption bears a considerable performance cost, the all-or-nothing condition forces users to choose between 'faster but unencrypted' or 'encrypted but slower' for all clients. In our use case at LinkedIn, we would like to selectively expose fast unencrypted access to fully managed internal clients, which can be trusted, while only expose encrypted access to clients outside of the trusted circle with higher security risks. That way we minimize performance overhead for trusted internal clients while still securing data from potential outside threats. We re-evaluate the RPC encryption mechanism in HDFS. Our design extends HDFS NameNode to run on multiple ports. Depending on the configuration, connecting to different NameNode ports would end up with different levels of encryption protection. This protection then gets enforced for both NameNode RPC and the subsequent data transfers to/from DataNode. System administrators then need to set up a simple firewall rule to allow access to the unencrypted port only for internal clients and expose the encrypted port to the outside clients. This approach comes with minimum operational and performance overhead. The feature has been introduced to Apache Hadoop under HDFS-13541.
YARN Ready: Integrating to YARN with Tez Hortonworks
YARN Ready webinar series helps developers integrate their applications to YARN. Tez is one vehicle to do that. We take a deep dive including code review to help you get started.
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
Outrageous Performance: RageDB's Experience with the Seastar FrameworkScyllaDB
Learn how RageDB leveraged the Seastar framework to build an outrageously fast graph database. Understand the right way to embrace the triple digit multi-core future by scaling up and not out. Sacrifice everything for speed and get out of the way of your users. No drivers, no custom protocols, no query languages, no GraphQL, just code in and JSON out. Exploit the built in Seastar HTTP server to tie it all together.
Faster, better, stronger: The new InnoDBMariaDB plc
For MariaDB Enterprise Server 10.5, the default transactional storage engine, InnoDB, has been significantly rewritten to improve the performance of writes and backups. Next, we removed a number of parameters to reduce unnecessary complexity, not only in terms of configuration but of the code itself. And finally, we improved crash recovery thanks to better consistency checks and we reduced memory consumption and file I/O thanks to an all new log record format.
In this session, we’ll walk through all of the improvements to InnoDB, and dive deep into the implementation to explain how these improvements help everything from configuration and performance to reliability and recovery.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
I presented these slides at JAX2010 in Germany to showcase how to develop interactive dashboards with Flex and JAVA. More information on my blog (www.riagora.com).
Netflix’s architecture involves thousands of microservices built to serve unique business needs. As this architecture grew, it became clear that the data storage and query needs were unique to each area; there is no one silver bullet which fits the data needs for all microservices. CDE (Cloud Database Engineering team) offers polyglot persistence, which promises to offer ideal matches between problem spaces and persistence solutions. In this meetup you will get a deep dive into the Self service platform, our solution to repairing Cassandra data reliably across different datacenters, Memcached Flash and cross region replication and Graph database evolution at Netflix.
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
3 Things to Learn About:
*How Apache Kudu enables users to do more than ever before with their Analytic and Operational Databases
*How Cloudera has built two versatile databases to help our customers tackle their hardest problems.
*How the addition of Apache Kudu to this mix will enable new use cases around real-time analytics, internet of things, time series data, and more.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
Nowadays, people are creating, sharing and storing data at a faster pace than ever before, effective data compression / decompression could significantly reduce the cost of data usage. Apache Spark is a general distributed computing engine for big data analytics, and it has large amount of data storing and shuffling across cluster in runtime, the data compression/decompression codecs can impact the end to end application performance in many ways.
However, there’s a trade-off between the storage size and compression/decompression throughput (CPU computation). Balancing the data compress speed and ratio is a very interesting topic, particularly while both software algorithms and the CPU instruction set keep evolving. Apache Spark provides a very flexible compression codecs interface with default implementations like GZip, Snappy, LZ4, ZSTD etc. and Intel Big Data Technologies team also implemented more codecs based on latest Intel platform like ISA-L(igzip), LZ4-IPP, Zlib-IPP and ZSTD for Apache Spark; in this session, we’d like to compare the characteristics of those algorithms and implementations, by running different micro workloads as well as end to end workloads, based on different generations of Intel x86 platform and disk.
It’s supposedly to be the best practice for big data software engineers to choose the proper compression/decompression codecs for their applications, and we also will present the methodologies of measuring and tuning the performance bottlenecks for typical Apache Spark workloads.
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder.
Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Samza) and in-house technologies have helped Uber scale.
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
Learn how data in motion is secure within Apache Kafka and the broader Confluent Platform, while data at rest can be secured by solutions like Vormetric Data Security Manager.
"Wire Encryption In HDFS: Protect Your Data From Others, Not Yourself"
ApacheCon 2019, Las Vegas.
SPEAKERS: Chen Liang, Konstantin Shvachko. LinkedIn
Wire data encryption is a key component of the Hadoop Distributed File System (HDFS). HDFS can enforce different levels of data protection, allowing users to specify one based on their own needs. However, such enforcement comes in as an all-or-nothing feature. Namely, wire encryption is enforced either for all accesses or none. Since encryption bears a considerable performance cost, the all-or-nothing condition forces users to choose between 'faster but unencrypted' or 'encrypted but slower' for all clients. In our use case at LinkedIn, we would like to selectively expose fast unencrypted access to fully managed internal clients, which can be trusted, while only expose encrypted access to clients outside of the trusted circle with higher security risks. That way we minimize performance overhead for trusted internal clients while still securing data from potential outside threats. We re-evaluate the RPC encryption mechanism in HDFS. Our design extends HDFS NameNode to run on multiple ports. Depending on the configuration, connecting to different NameNode ports would end up with different levels of encryption protection. This protection then gets enforced for both NameNode RPC and the subsequent data transfers to/from DataNode. System administrators then need to set up a simple firewall rule to allow access to the unencrypted port only for internal clients and expose the encrypted port to the outside clients. This approach comes with minimum operational and performance overhead. The feature has been introduced to Apache Hadoop under HDFS-13541.
YARN Ready: Integrating to YARN with Tez Hortonworks
YARN Ready webinar series helps developers integrate their applications to YARN. Tez is one vehicle to do that. We take a deep dive including code review to help you get started.
Evening out the uneven: dealing with skew in FlinkFlink Forward
Flink Forward San Francisco 2022.
When running Flink jobs, skew is a common problem that results in wasted resources and limited scalability. In the past years, we have helped our customers and users solve various skew-related issues in their Flink jobs or clusters. In this talk, we will present the different types of skew that users often run into: data skew, key skew, event time skew, state skew, and scheduling skew, and discuss solutions for each of them. We hope this will serve as a guideline to help you reduce skew in your Flink environment.
by
Jun Qin & Karl Friedrich
Outrageous Performance: RageDB's Experience with the Seastar FrameworkScyllaDB
Learn how RageDB leveraged the Seastar framework to build an outrageously fast graph database. Understand the right way to embrace the triple digit multi-core future by scaling up and not out. Sacrifice everything for speed and get out of the way of your users. No drivers, no custom protocols, no query languages, no GraphQL, just code in and JSON out. Exploit the built in Seastar HTTP server to tie it all together.
Faster, better, stronger: The new InnoDBMariaDB plc
For MariaDB Enterprise Server 10.5, the default transactional storage engine, InnoDB, has been significantly rewritten to improve the performance of writes and backups. Next, we removed a number of parameters to reduce unnecessary complexity, not only in terms of configuration but of the code itself. And finally, we improved crash recovery thanks to better consistency checks and we reduced memory consumption and file I/O thanks to an all new log record format.
In this session, we’ll walk through all of the improvements to InnoDB, and dive deep into the implementation to explain how these improvements help everything from configuration and performance to reliability and recovery.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
I presented these slides at JAX2010 in Germany to showcase how to develop interactive dashboards with Flex and JAVA. More information on my blog (www.riagora.com).
English version of CEDEC Flash performance tuning presentation
Note: what these slides refer to as "Project Monocle" has now been released, and is called Adobe Scout ( http://gaming.adobe.com/technologies/scout/ )
Slides presented at the Webinale in Berlin to open your mind and your eyes on "What is Flash". Amazing projects developed by the Flash community. Innovation is in the DNA of Flash
Slides presented at the JAX2010 keynote in Germany by Michael CHAIZE, Flash Platform Evangelist for Adobe. Rich Internet Applications developed with Flex and JAVA.
Das brandneue Flash 11 bringt neben einem erstklassigen Multiplattform-Entwicklungswerkzeug für iOS, Android und Blackberry auch viele Neuerungen in Bezug auf Browserkomponenten und Games. Schützenhilfe für sonst Programmieraufwändige HTML5 Animationen kommt dabei von Adobe Edge.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
HBase and Hadoop at Adobe
1. Big Data with
HBase and
Hadoop at Adobe
Cosmin Lehene
Programatica, November, 2010
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 1
2. Who am I
Cosmin Lehene
Adobe Services and Infrastructure Team = SaaS services
HBase and Hadoop contributor
clehene@adobe.com
@clehene
h p://hstack.org
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 2
2
3. Why I am here today
§ Riding the elephant since 2008
§ Analytics, BI, Machine Learning
§ Images, Videos, Flash, Web, etc.
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 3
3
4. Opaque Data (logs, archives)
§ Web traffic
§ Business events
§ User interactions
§ Infrastructure data
§ Database logs, web server logs, etc.
§ Etc.
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 4
4
7. Can I
§ JOIN everything?
§ Increase user engagement?
§ Increase conversion rate?
§ Make $$$? J
§ Fast and cheap?
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 7
7
8. Understand data and extract meaning
Real-time access to meaningful data
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 8
8
9. Agenda
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 9
9
10. noSQL 101
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 10
1
11. Scaling RDBMS
§ Scale up
§ More memory
§ More CPU
§ Faster disks, SAN, etc.
§ Problems
§ Expensive
§ ere’s a limit
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 11
1
12. Scaling RDBMS
§ Scale horizontally
§ Replication (reads)
§ Sharding/ Horizontal Partitioning (writes)
§ Server 1: a-m, Server 2: m-z
§ Denormalization
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 12
1
13. Replication
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 13
1
14. Sharding
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 14
1
15. Sharding
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 15
1
16. Sharding & Replication
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 16
1
17. Scaling RDBMS problems
§ Hard to repartition/reshard
§ Pre allocate shards 2, 3, 100
§ Query each shard
§ High operational costs
§ Eventual consistency
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 17
1
18. Enter noSQL – the beginning
§ Google: BigTable
§ Amazon: Dynamo
§ Memcached
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 18
1
19. Data Models
§ Key-value
§ Columnar/Tabular
§ Document oriented
§ Graph
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 19
1
20. Architectures
§ Distributed hash tables
§ Consistent Hashing
§ Gossip
§ Vector clocks
§ Locality groups
§ Partitioning, replication
§ etc.
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 20
2
21. Properties
§ Scalability
§ Failover
§ Durability
§ Consistency
§ Availability
§ Partition Tolerance
§ Etc.
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 21
2
22. Cartesian Product
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 22
2
23. What do all these have in common
§ Different data models
noSQL
§ Different architectures
§ Different properties
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 23
24. Hadoop
h p://hadoop.apache.org
§ HDFS (distributed fs)
§ Map-reduce (distributed processing)
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 24
2
25. Adobe Media Player
Increase video
consumption
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 25
26. AMP
§ Recommendations
§ Related content
§ Related users
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 26
2
27. Video logs
§ X watched movie A (comedy)
§ Y watched movie B (drama)
§ Z watched movie C (thriller)
§ Z watched movie A (comedy)
§ X watched movie D (technology)
§ Y watched movie C (thriller)
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 27
2
28. Which users are alike?
§ Compare every 2 users?
§ 5M vectors
§ 120 dimensions
§ Distance is not enough – needed groups
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 28
2
29. How?
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 29
2
30. Custer projections
§ 1 month
§ 6GB
§ 700k Users
§ 114 genres
§ 7 nodes
§ 5 hours
§ 27 clusters
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 30
3
31. Game Constellations
§ Processing Shockwave logs
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 31
32. Lessons learned
Need:
§ Fine grain access
§ Incremental updates
§ Deal with changes in the original dataset
§ Real-time data serving
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 32
3
33. ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 33
3
34. h p://hbase.apache.org
§ Sparse, distributed, persistent multidimensional
sorted map
§ Column oriented store
§ Autosharding
§ Data locality
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 34
3
35. Data Model
table: row: family: column: value: version
domain.com/x.swf
swf:
sfw:size = 1876 bytes | 1876 bytes
swf:fps = 30
swf:avm = 3
html:
embed = dynamic
status:
last_crawl = 2010/11/26 | last_crawl = 2010/11/25
domain.com/y.swf
domain.com/z.swf ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 35
3
36. API
§ Get
§ Put
§ Delete
§ Scan
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 36
37. Flash
How is ash used
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 37
38. How is ash used in the “wild”?
§ AVM popularity
§ Frame rates
§ Video formats
§ SWF size
§ Flex data structures
§ …
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 38
3
39. How
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 39
3
40. How
max 1000
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 40
4
41. e hard way
§ Hadoop
§ Nutch
§ HBase
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 41
4
42. Work ow
§ Crawl:
§ Nutch (seed: top-1m.csv Alexa)
§ Detect ash embed, javascript
§ Browse:
§ Hadoop + FF + FP (chromeless)
§ Dump stack traces, memory, swf bytes, etc.
§ Process:
§ Parse stack traces, rank, etc.
§ Export:
§ Hbase: swf table
§ Md5, swf bytecode, memory, load time, etc. ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 42
4
43. ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 43
4
44. ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 44
4
45. ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 45
4
46. Bene ts
§ Security xes
§ Optimization
§ Prioritize based on real usage
§ Testing
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 46
4
47. SaasBase – Hbase++ as a service
§ Data storage (HBase + HDFS)
§ Domains, tables,
§ API: create, put, get, scan
§ Analytics (HBase + Hadoop + query engine)
§ Reports, dimensions, metrics
§ API: query
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 47
4
48. photoshop.com
Image analytics
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 48
49. ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 49
4
50. photoshop.com
§ 1B assets (images, videos, other)
§ 120M with EXIF metadata
§ 1.5 petabytes
§ Home grown distributed storage
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 50
5
51. Intelligence
§ Targeting users:
§ Professionals or Amateurs?
§ Where are pictures taken?
§ Targeting partners:
§ Popular cameras
§ Tracking campaigns
§ New accounts
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 51
5
52. 5
2
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 52
53. ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 53
5
54. ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 54
5
55. ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 55
5
56. Stats
§ 7 Machines (16 cores, 24 x 10K RPM SATA, 32GB
RAM, 1Gbps)
§ Map 700M records
§ 2hrs, 41mins
§ Map output: 1.9B records (~80GB)
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 56
5
57. Lessons
§ SUM, COUNT, AVG, MIN, MAX, GROUP BY,
HAVING, etc.
§ Rollup, drilldown, segmentation
-----------------------------------------------------------
It’s all about Dimensions & Metrics
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 57
5
58. Recap
§ Hadoop + Mahout + PIG (User clusters)
§ HBase + Hadoop + Nutch+ MySQL (Flash analytics)
§ HBase + Hadoop (EXIF Explorer, image analytics)
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 58
5
59. Business Catalyst
Analytics
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 59
60. BC
§ End to end platform for online businesses
§ E-commerce, Blogging, CRM, email marketing
§ Analytics: web traffic, affiliates, sales, etc.
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 60
6
61. ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 61
6
62. ®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 62
6
63. Successtrophe
§ Analytics is troublesome
§ SQL database was slow for analytics
§ Over 50 different reports
§ Over 100,000 websites
§ Billions of page views
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 63
64. Requirements
§ Fast incremental processing
§ Custom reporting
§ Filtering, segmentation, rollups, drilldowns
§ Variable time ranges
§ Fast
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 64
6
65. Solution
§ Continuous processing (every 10 minutes)
§ Reports de nition: dimensions, metrics
§ Real-time queries: directly from HBase
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 65
6
66. Work ow
§ Import Logs ->HBase
§ Incrementally process/index last 24 hours
§ Serve from HBase
§ Index scans
§ Runtime aggregation
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 66
6
67. Stats
§ 1 datacenter, 10 months = 1 hour, 24 minutes
§ > 3 Billion report items generated
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 67
6
68. Lessons
§ UNIQUE is harder
§ E.g :Unique visitors, Visitor loyalty
§ Space vs. time
§ Sorting magic
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 68
6
69. Not just web analytics
X Analytics
§ Feed in any le format (w3c, apache, tsv, etc.)
§ Tag the dimensions and metrics
§ Process (incremental)
§ Query in real-time
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 69
6
70. Nothing but the hstack
§ structured data storage: HBase
§ le storage HDFS
§ data processing: Hadoop
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 70
7
71. Conclusions
§ Keep data
§ Understand data
§ Explore data
§ Extract meaning
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 71
7
72. h p://hstack.org
h p://hbase.apache.org
h p://hadoop.apache.org
h p://mahout.apache.org
h p://nutch.apache.org
®
Copyright 2009 Adobe Systems Incorporated. All rights reserved. Adobe con dential. 72
7