Building Cloud-Native App Series - Part 11 of 11
Microservices Architecture Series
Service Mesh - Observability
- Zipkin
- Prometheus
- Grafana
- Kiali
Visualize some of Austin's open source data using Elasticsearch with Kibana. ObjectRocket's Steve Croce presented this talk on 10/13/17 at the DBaaS event in Austin, TX.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Building Cloud-Native App Series - Part 11 of 11
Microservices Architecture Series
Service Mesh - Observability
- Zipkin
- Prometheus
- Grafana
- Kiali
Visualize some of Austin's open source data using Elasticsearch with Kibana. ObjectRocket's Steve Croce presented this talk on 10/13/17 at the DBaaS event in Austin, TX.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, GrubHub, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionBATbern
What powers the AI/ML services of Switzerland's leading telecommunication company? In this talk, we will provide an overview of the different AI/ML projects at Swisscom, from Conversational AI and Recommender Systems to Anomaly Detection. Moreover, we will show how we automate, scale, and operationalise these ML pipelines in production, highlighting the MLOps techniques and open source tools that are used. Finally, we will present Swisscom's roadmap towards the cloud with AWS and discuss how we envision a common MLOps solution for the organisation.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDatabricks
The trade-off between development speed and pipeline maintainability is a constant for data engineers, especially for those in a rapidly evolving organization
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world so that nearly every streaming framework now supports higher level relational operations.
On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in an enterprise production environment to deploy and operationalized?
The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story?
We discuss the drivers and expected benefits of changing the existing event processing systems. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalDatabricks
Ingesting data from variety of sources like Mysql, Oracle, Kafka, Sales Force, Big Query, S3, SaaS applications, OSS etc. with billions of records into datalake (for reporting, adhoc analytics, ML jobs) with reliability, consistency, schema evolution support and within expected SLA has always been a challenging job. Also ingestion may have different flavors like full ingestion, incremental ingestion with and without compaction/de-duplication and transformations with their own complexity of state management and performance. Not to mention dependency management where hundreds / thousands of downstream jobs are dependent on this ingested data and hence data availability on time is of utmost importance. Most data teams end up creating adhoc ingestion pipelines written in different languages and technologies which adds operational overheads and knowledge is mostly limited to few.
In this session, I will talk about how we leveraged Sparks Dataframe abstraction for creating generic ingestion platform capable of ingesting data from varied sources with reliability, consistency, auto schema evolution and transformations support. Will also discuss about how we developed spark based data sanity as one of the core components of this platform to ensure 100% correctness of ingested data and auto-recovery in case of inconsistencies found. This talk will also focus how Hive table creation and schema modification was part of this platform and provided read time consistencies without locking while Spark Ingestion jobs were writing on the same Hive tables and how we maintained different versions of ingested data to do any rollback if required and also allow users of this ingested data to go back in time and read snapshot of ingested data at that moment.
Post this talk one should be able to understand challenges involved in ingesting data reliably from different sources and how one can leverage Spark’s Dataframe abstraction to solve this in unified way.
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format addresses some of these problems but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.
Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.
Watch Alex Merced, Developer Advocate at Dremio, as he describes the open architecture and performance-oriented capabilities of Apache Iceberg.
You will learn:
• The issues that arise when using the Hive table format at scale, and why we need a new table format
• How a straightforward, elegant change in table format structure has enormous positive effects
• The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
• The resulting benefits of this architectural design
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...Randy Shoup
eBay architects Randy Shoup and Dan Pritchett give a guided tour of the eBay architecture. They cover the evolution of the technology stack from Perl to C++ to Java. And they discuss scaling strategies for the data tier, application tier, search, and operations.
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingIlyas F ☁☁☁
If you are a Cloud Architect, Developer, IT Manager, Director or whoever may be, if you are associated with Azure or AWS cloud in some form, I’m sure you must have come across a common question.
“What is the alternate service available in Azure or AWS vice versa and it’s pricing?” I’m sure you will say yes!
Agreed, it’s hard to remember all the services offered by public clouds, i.e. Azure and AWS. Remembering existing services and their benefits itself is a big task, on top of that updating ourselves with the new feature releases and enhancements is another major task.
So I put together a Service & Feature Mappings between Microsoft Azure & AWS for my and colleagues quick reference.
I hope you also find this piece informative.
TeraStream - Data Integration/Migration/ETL/Batch ToolDataStreams
TeraStream™ is leading the Korean data migration and ETL market. Take a look at the powerful performances, features and user conveniences of TeraStream™
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, GrubHub, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionBATbern
What powers the AI/ML services of Switzerland's leading telecommunication company? In this talk, we will provide an overview of the different AI/ML projects at Swisscom, from Conversational AI and Recommender Systems to Anomaly Detection. Moreover, we will show how we automate, scale, and operationalise these ML pipelines in production, highlighting the MLOps techniques and open source tools that are used. Finally, we will present Swisscom's roadmap towards the cloud with AWS and discuss how we envision a common MLOps solution for the organisation.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDatabricks
The trade-off between development speed and pipeline maintainability is a constant for data engineers, especially for those in a rapidly evolving organization
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
In recent years, big data has moved from batch processing to stream-based processing since no one wants to wait hours or days to gain insights. Dozens of stream processing frameworks exist today and the same trend that occurred in the batch-based big data processing realm has taken place in the streaming world so that nearly every streaming framework now supports higher level relational operations.
On paper, combining Apache NiFi, Kafka, and Spark Streaming provides a compelling architecture option for building your next generation ETL data pipeline in near real time. What does this look like in an enterprise production environment to deploy and operationalized?
The newer Spark Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing with elegant code samples, but is that the whole story?
We discuss the drivers and expected benefits of changing the existing event processing systems. In presenting the integrated solution, we will explore the key components of using NiFi, Kafka, and Spark, then share the good, the bad, and the ugly when trying to adopt these technologies into the enterprise. This session is targeted toward architects and other senior IT staff looking to continue their adoption of open source technology and modernize ingest/ETL processing. Attendees will take away lessons learned and experience in deploying these technologies to make their journey easier.
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalDatabricks
Ingesting data from variety of sources like Mysql, Oracle, Kafka, Sales Force, Big Query, S3, SaaS applications, OSS etc. with billions of records into datalake (for reporting, adhoc analytics, ML jobs) with reliability, consistency, schema evolution support and within expected SLA has always been a challenging job. Also ingestion may have different flavors like full ingestion, incremental ingestion with and without compaction/de-duplication and transformations with their own complexity of state management and performance. Not to mention dependency management where hundreds / thousands of downstream jobs are dependent on this ingested data and hence data availability on time is of utmost importance. Most data teams end up creating adhoc ingestion pipelines written in different languages and technologies which adds operational overheads and knowledge is mostly limited to few.
In this session, I will talk about how we leveraged Sparks Dataframe abstraction for creating generic ingestion platform capable of ingesting data from varied sources with reliability, consistency, auto schema evolution and transformations support. Will also discuss about how we developed spark based data sanity as one of the core components of this platform to ensure 100% correctness of ingested data and auto-recovery in case of inconsistencies found. This talk will also focus how Hive table creation and schema modification was part of this platform and provided read time consistencies without locking while Spark Ingestion jobs were writing on the same Hive tables and how we maintained different versions of ingested data to do any rollback if required and also allow users of this ingested data to go back in time and read snapshot of ingested data at that moment.
Post this talk one should be able to understand challenges involved in ingesting data reliably from different sources and how one can leverage Spark’s Dataframe abstraction to solve this in unified way.
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format addresses some of these problems but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.
Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.
Watch Alex Merced, Developer Advocate at Dremio, as he describes the open architecture and performance-oriented capabilities of Apache Iceberg.
You will learn:
• The issues that arise when using the Hive table format at scale, and why we need a new table format
• How a straightforward, elegant change in table format structure has enormous positive effects
• The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
• The resulting benefits of this architectural design
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...Randy Shoup
eBay architects Randy Shoup and Dan Pritchett give a guided tour of the eBay architecture. They cover the evolution of the technology stack from Perl to C++ to Java. And they discuss scaling strategies for the data tier, application tier, search, and operations.
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingIlyas F ☁☁☁
If you are a Cloud Architect, Developer, IT Manager, Director or whoever may be, if you are associated with Azure or AWS cloud in some form, I’m sure you must have come across a common question.
“What is the alternate service available in Azure or AWS vice versa and it’s pricing?” I’m sure you will say yes!
Agreed, it’s hard to remember all the services offered by public clouds, i.e. Azure and AWS. Remembering existing services and their benefits itself is a big task, on top of that updating ourselves with the new feature releases and enhancements is another major task.
So I put together a Service & Feature Mappings between Microsoft Azure & AWS for my and colleagues quick reference.
I hope you also find this piece informative.
TeraStream - Data Integration/Migration/ETL/Batch ToolDataStreams
TeraStream™ is leading the Korean data migration and ETL market. Take a look at the powerful performances, features and user conveniences of TeraStream™
BEANs란? Big data Analysis Network System
합리적인 가격의 쉽고 편리한 빅데이터 패키지 Easy Big Data
기존 시스템으로 분석이 불가능했던 정형/반정형/비정형의 빅데이터를 수집/저장/분석/시각화하는 전 과정을 지원함으로써,
고객이 보유한 데이터에 대한 insight와 solution을 얻을 수 있도록 최적화되어,
패키징된 서비스 맞춤형 하이브리드 DW를 지원하는 대용량 데이터 분석 시스템입니다.
The Pentaho BI certification training provides a platform—including a server, client tools, and supporting technologies that enable a full spectrum of BI functionality. Our specially designed Pentaho training/tutorial will include Basic Architecture of Pentaho, Introduction to Pentaho BI suite, Pentaho Kettle Tutorial, data integration and data analytics training.
download visit
www.softwaretrainingmaterials.blogspot.com
Pentaho Business Intelligence Services offers an innovative solution for maximum credibility, so that the customers can manage their business as well as focus on business growth. Find out more through this PPT
Korean No. 1 company which has expertise in total data management and big data processing. Provide high performance software products and professional consulting services for data integration & migration, data governance, data warehouse and big data processing & storage.
계속해서 증가하는 기업 데이터, 어떻게 관리해야 할까요?
데이터 관리의 가장 기본은 정확한 DB설계와 구축이며, 이를 위해 스마트한 데이터모델링 작업과 툴이 필요합니다.
본 자료는 데브기어에서 진행한 온라인 세미나 '효과적인 데이터모델링을 위한 14가지 방법'의 발표 자료로, 해당 세미나는 다음 링크를 통해 다시 볼 수 있습니다: http://goo.gl/DlfO8I
STEG - IT operations management (ITOM) is the administrative area involving technology infrastructure components and the requirements of individual applications, services, storage, networking and connectivity elements within an organization.