This document discusses using StreamSets Data Collector (SDC) to build a logging infrastructure for microservices. SDC can ingest logs from microservices running in containers and handle issues like schema changes and new log formats. It processes and transforms the logs, sending them to destinations like Kafka. SDC pipelines can run on Spark clusters on Yarn and Mesos to handle large volumes of log data and load it into systems like HDFS, HBase and Elasticsearch for analysis.
As GoPro expands into content networks and launches new products, new challenges have appeared. One of the most critical challenges facing GoPro during this period of rapid growth is their ability to make effective use of massive amounts of data. Every day, GoPro collects increasing amounts of data generated by internet connected consumer devices (smart cameras, smart drones), GoPro mobile apps, GoPro content networks, GoPro e-commerce sales, and social media. This data ranges from raw camera logs to refined and well-structured e-commerce datasets. In the past, it took GoPro months to understand new inbound data and determine how to transform or augment it for analysis. To streamline this process and bridge the gap between tech-savvy engineers and data-savvy analysts, GoPro is creating an analysis loop, which informs product usage trends and product insights. This analysis loop serves a large ecosystem of GoPro executives, product managers, engineers, data scientists, and business analysts through an integrated technology pipeline consisting of Apache Kafka, Apache Spark Streaming, Cloudera’s distribution of Hadoop, and Tableau’s Data Visualization Software as the end user analytical tool. Session sponsored by Tableau Software.
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...Amazon Web Services
In this session, we share how an Amazon.com team that owns a document management platform that manages billions of critical customer documents for Amazon.com migrated from a relational to a non-relational database. Initially, the service was built as an Oracle database. As it grew, the team discovered the limits of the relational model and decided to migrate to a non-relational database. They chose Amazon DynamoDB for its built-in resilience, scalability, and predictability. We provide a template that you can use to migrate from a relational data store to DynamoDB. We also provide details about the entire process: design patterns for moving from a SQL schema to a NoSQL schema; mechanisms used to transition from an ACID (Atomicity, Consistency, Isolation, Durability) model to an eventually consistent model; migration alternatives considered; pitfalls in common migration strategies; and how to ensure service availability and consistency during migration.
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
During this session Greg Brandt and Liyin Tang, Data Infrastructure engineers from Airbnb, will discuss the design and architecture of Airbnb's streaming ETL infrastructure, which exports data from RDS for MySQL and DynamoDB into Airbnb's data warehouse, using a system called SpinalTap. We will also discuss how we leverage Spark Streaming to compute derived data from tracking topics and/or database tables, and HBase to provide immediate data access and generate cleanly time-partitioned Hive tables.
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Amazon Web Services
Developers and DBAs from a traditional relational background are spoilt for choice when looking to integrate caching and NoSQL into an application architecture to solve scaling problems and reduce costs. Even when using relational databases there are 3 managed database services on AWS for the MySQL engine alone. Trying to evaluate all the options often creates analysis paralysis, resulting in a reluctance to try something new or different. This session will guide you through a series of use cases that use different databases to solve business problems that customers face today.
Building analytics applications requires more than just one good service. It requires the ability to capture a vast amount of data, and react to data changes in real time. It requires flexible tools which enable end users to work in the way they can be most productive, and which addresses the needs of both data consumers, as well as data scientists. This analysis won't just be about data exploration and reports, but must be able to support the largest scale, complex machine and deep learning models imaginable. Across it all, strong governance, security, and cataloguing is essential. In this session, come to hear about how to build a full stack analytics application using AWS Services. We'll see how to capture static and dynamic data in real time, and react to data changes. We'll see AWS Services which perform analytics from drag-and-drop, through simple query-on-files, and into exascale data science. At the end, we'll have a data lake architecture that will meet the demands of the most sophisticated analytics customers for many years to come.
AWS Speaker: Ian Robinson, Specialist Solution Architect, Big Data and Analytics, EMEA - Amazon Web Services
As GoPro expands into content networks and launches new products, new challenges have appeared. One of the most critical challenges facing GoPro during this period of rapid growth is their ability to make effective use of massive amounts of data. Every day, GoPro collects increasing amounts of data generated by internet connected consumer devices (smart cameras, smart drones), GoPro mobile apps, GoPro content networks, GoPro e-commerce sales, and social media. This data ranges from raw camera logs to refined and well-structured e-commerce datasets. In the past, it took GoPro months to understand new inbound data and determine how to transform or augment it for analysis. To streamline this process and bridge the gap between tech-savvy engineers and data-savvy analysts, GoPro is creating an analysis loop, which informs product usage trends and product insights. This analysis loop serves a large ecosystem of GoPro executives, product managers, engineers, data scientists, and business analysts through an integrated technology pipeline consisting of Apache Kafka, Apache Spark Streaming, Cloudera’s distribution of Hadoop, and Tableau’s Data Visualization Software as the end user analytical tool. Session sponsored by Tableau Software.
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...Amazon Web Services
In this session, we share how an Amazon.com team that owns a document management platform that manages billions of critical customer documents for Amazon.com migrated from a relational to a non-relational database. Initially, the service was built as an Oracle database. As it grew, the team discovered the limits of the relational model and decided to migrate to a non-relational database. They chose Amazon DynamoDB for its built-in resilience, scalability, and predictability. We provide a template that you can use to migrate from a relational data store to DynamoDB. We also provide details about the entire process: design patterns for moving from a SQL schema to a NoSQL schema; mechanisms used to transition from an ACID (Atomicity, Consistency, Isolation, Durability) model to an eventually consistent model; migration alternatives considered; pitfalls in common migration strategies; and how to ensure service availability and consistency during migration.
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
During this session Greg Brandt and Liyin Tang, Data Infrastructure engineers from Airbnb, will discuss the design and architecture of Airbnb's streaming ETL infrastructure, which exports data from RDS for MySQL and DynamoDB into Airbnb's data warehouse, using a system called SpinalTap. We will also discuss how we leverage Spark Streaming to compute derived data from tracking topics and/or database tables, and HBase to provide immediate data access and generate cleanly time-partitioned Hive tables.
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Amazon Web Services
Developers and DBAs from a traditional relational background are spoilt for choice when looking to integrate caching and NoSQL into an application architecture to solve scaling problems and reduce costs. Even when using relational databases there are 3 managed database services on AWS for the MySQL engine alone. Trying to evaluate all the options often creates analysis paralysis, resulting in a reluctance to try something new or different. This session will guide you through a series of use cases that use different databases to solve business problems that customers face today.
Building analytics applications requires more than just one good service. It requires the ability to capture a vast amount of data, and react to data changes in real time. It requires flexible tools which enable end users to work in the way they can be most productive, and which addresses the needs of both data consumers, as well as data scientists. This analysis won't just be about data exploration and reports, but must be able to support the largest scale, complex machine and deep learning models imaginable. Across it all, strong governance, security, and cataloguing is essential. In this session, come to hear about how to build a full stack analytics application using AWS Services. We'll see how to capture static and dynamic data in real time, and react to data changes. We'll see AWS Services which perform analytics from drag-and-drop, through simple query-on-files, and into exascale data science. At the end, we'll have a data lake architecture that will meet the demands of the most sophisticated analytics customers for many years to come.
AWS Speaker: Ian Robinson, Specialist Solution Architect, Big Data and Analytics, EMEA - Amazon Web Services
AWS re:Invent 2016: Fireside chat with Groupon, Intuit, and LifeLock on solvi...Amazon Web Services
Redis Labs' CMO is hosting a fireside chat with leaders from multiple industries including Groupon (e-commerce ), Intuit (Finance ), and LifeLock (Identity Protection ). This conversation-style session will cover the Big Data related challenges faced by these leading companies as they scale their applications, ensure high availability, serve the best user experience at lowest latencies, and optimize between cloud and on-premises operations. The introductory level session will appeal to both developer and DevOps functions. They will hear about diverse use cases such as recommendations engine, hybrid transactions and analytics operations, and time-series data analysis. The audience will learn how the Redis in-memory database platform addresses the above use cases with its multi-model capability and in a cost effective manner to meet the needs of the next generation applications. Session sponsored by Redis Labs.
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)Amazon Web Services
Learn how to leverage new workflow management tools to simplify complex data pipelines and ETL jobs spanning multiple systems. In this technical deep dive from Treasure Data, company founder and chief architect walks through the codebase of DigDag, our recently open-sourced workflow management project. He shows how workflows can break large, error-prone SQL statements into smaller blocks that are easier to maintain and reuse. He also demonstrates how a system using ‘last good’ checkpoints can save hours of computation when restarting failed jobs and how to use standard version control systems like Github to automate data lifecycle management across Amazon S3, Amazon EMR, Amazon Redshift, and Amazon Aurora. Finally, you see a few examples where SQL-as-pipeline-code gives data scientists both the right level of ownership over production processes and a comfortable abstraction from the underlying execution engines. This session is sponsored by Treasure Data.
AWS Competency Partner
When the Cloud is a Rockin: High Availability in Apache CloudStackJohn Burwell
CloudStack currently provides a variety bespoke high availability mechanisms for resources such as virtual machines, hosts, and virtual routers. Each of these implementations duplicates the HA check/recovery cycle, as well as, concurrency, persistence, and clustering required manage high available for any CloudStack resource. The High Availability Resource Management Service has been developed to consolidate these concerns -- providing a robust, extensible HA mechanism. Using this service, plugins only need to define health check, activity check, and fence operations.
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...Amazon Web Services
Building big data applications often requires integrating a broad set of technologies to store, process, and analyze the increasing variety, velocity, and volume of data being collected by many organizations.
Using a combination of Amazon EMR, a managed Hadoop framework, and Amazon Redshift, a managed petabyte-scale data warehouse, organizations can effectively address many of these requirements.
In this webinar, we will show how organizations are using Amazon EMR and Amazon Redshift to build more agile and scalable architectures for big data. We will look into how you can leverage Spark and Presto running on EMR, to address multiple data processing requirements. We will also share best practices and common use cases to integrate EMR and Redshift.
Learning Objectives:
• Best practices for building a big data architecture that includes Amazon EMR and Amazon Redshift
• Understand how to use technologies such as Amazon EMR, Presto and Spark to complement your data warehousing environment
• Learn key use cases for Amazon EMR and Amazon Redshift
Who Should Attend:
• Data architects, Data management professionals, Data warehousing professionals, BI professionals
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...Pat Patterson
On a typical day we see hundreds of downloads of StreamSets Data Collector, our open source data integration tool. We used to wrangle our download logs using a combination of the AWS S3 command line, sed, grep, awk and other tools, all run from a shell script (on my laptop!) once a week. This was a classic example of a brittle, hard to maintain, custom data integration. One day it dawned on me, "This is crazy, we have a tool that can do all this!". In this session, I'll explain how I built a dataflow pipeline to stream content delivery network (CDN) logs from S3 to MySQL in real-time, allowing us to gain valuable insights into our open source community. You'll also learn how we use the same techniques to not only gain insights into our community on Slack, but also build tools to better serve them.
Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data and analytics application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
In this one-hour webinar, we will look at the portfolio of AWS Big Data services and how they can be used to build a modern data architecture.
We will cover:
Using different SQL engines to analyze large amounts of structured data
Analysing streaming data in near-real time
Architectures for batch processing
Best practices for Data Lake architectures
This session is suited for:
Solution and enterprise architects
Data architects/ Data warehouse owners
IT & Innovation team members
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
The world is creating more data in more ways than ever before. The average internet user in 2017 generates 1.5GB of data per day, with the rate doubling every 18 months. A single autonomous vehicle can generate 4TB per day. Each smart manufacturing plant generates 1PB per day. Storing, managing, and analyzing this data requires integrated database and analytic services that provide reliability and security at scale. AWS offers a range of managed data services that let customers focus on making data useful, including Amazon Aurora, RDS, DynamoDB, Redshift, Spectrum, ElastiCache, Kinesis, EMR, Elasticsearch Service, and Glue. In this session, we discuss these services, share our vision for innovation, and show how our customers use these services today. Learn More: https://aws.amazon.com/government-education/
FSI301 An Architecture for Trade Capture and Regulatory ReportingAmazon Web Services
For many securities organizations, post-trade processing is expensive, cumbersome, and time-consuming. This is in part due to the massive volumes of data required for processing a trade and the limited agility of the technology many organizations rely on today. In order to create efficiencies and move faster, many Financial Services organizations are working with AWS to implement post-trade solutions built with AWS’ storage services (S3 and Glacier) and big data capabilities (Athena, EMR, Redshift, and QuickSight ). In this session, AWS will walk through a trade capture and regulatory reporting solution that utilizes the aforementioned AWS services. We will also provide guidance around obtaining data-driven insights (from pixels to pictures), bolstering encryption with Amazon KMS, and maintaining transparency and control with Amazon CloudWatch and Amazon CloudTrail (which also helps meet SEC Rule 613 that requires the creation of comprehensive consolidated audit trails).
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...Amazon Web Services
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), application programming interfaces (API), clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. Building scalable big data pipelines with automated extract-transform-load (ETL) and machine learning processes can address these limitations. JustGiving is the world’s largest social platform for online giving. In this session, we describe how we created several scalable and loosely coupled event-driven ETL and ML pipelines as part of our in-house data science platform called RAVEN. You learn how to leverage AWS Lambda, Amazon S3, Amazon EMR, Amazon Kinesis, and other services to build serverless, event-driven, data and stream processing pipelines in your organization. We review common design patterns, lessons learned, and best practices, with a focus on serverless big data architectures with AWS Lambda.
AWS re:Invent 2016: Cloud Monitoring - Understanding, Preparing, and Troubles...Amazon Web Services
Applications running in a typical data center are static entities. Dynamic scaling and resource allocation are the norm in AWS. Technologies such as Amazon EC2, Docker, AWS Lambda, and Auto Scaling make tracking resources and resource utilization a challenge. The days of static server monitoring are over.
In this session, we examine trends we’ve observed across thousands of customers using dynamic resource allocation and discuss why dynamic infrastructure fundamentally changes your monitoring strategy. We discuss some of the best practices we’ve learned by working with New Relic customers to build, manage, and troubleshoot applications and dynamic cloud services. Session sponsored by New Relic.
AWS Competency Partner
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleAmazon Web Services
"With cloud maturity comes operational efficiencies and endless potential for innovation and business growth. However, the complexities of governing cloud infrastructure are impeding without the right strategy. Visibility, accountability, and actionable insights are some of the most invaluable considerations. The AWS cloud clearly enables convenience and cost savings for organizations that know how to leverage its full potential. Amazon EC2 Reserved Instances (RIs) in particular, present a tremendous opportunity when scaling to save significantly on capacity but there are many considerations to fully reaping the benefits of RIs. In this session, CloudCheckr CTO Patrick Gartlan will present issues that every organization runs into when scaling, provide best practices for how to combat them and help you show your boss how RIs help you save money and move faster.
This session is brought to you by AWS Summit New York City sponsor, CloudCheckr. "
Building Continuously Curated Ingestion PipelinesArvind Prabhakar
Data ingestion is a critical piece of infrastructure for any Big Data project. Learn about the key challenges in building Ingestion infrastructure and how enterprises are solving them using low level frameworks like Apache Flume, Kafka, and high level systems such as StreamSets.
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
Big Data tools such as Hadoop and Spark allow you to process data at unprecedented scale, but keeping your processing engine fed can be a challenge. Upstream data sources can 'drift' due to infrastructure, OS and application changes, causing ETL tools and hand-coded solutions to fail, inducing heartburn in even the most resilient data scientist. This session will survey the big data ingestion landscape, focusing on how open source tools such as Sqoop, Flume, Nifi and StreamSets can keep the data pipeline flowing.
AWS re:Invent 2016: Fireside chat with Groupon, Intuit, and LifeLock on solvi...Amazon Web Services
Redis Labs' CMO is hosting a fireside chat with leaders from multiple industries including Groupon (e-commerce ), Intuit (Finance ), and LifeLock (Identity Protection ). This conversation-style session will cover the Big Data related challenges faced by these leading companies as they scale their applications, ensure high availability, serve the best user experience at lowest latencies, and optimize between cloud and on-premises operations. The introductory level session will appeal to both developer and DevOps functions. They will hear about diverse use cases such as recommendations engine, hybrid transactions and analytics operations, and time-series data analysis. The audience will learn how the Redis in-memory database platform addresses the above use cases with its multi-model capability and in a cost effective manner to meet the needs of the next generation applications. Session sponsored by Redis Labs.
AWS re:Invent 2016: Automating Workflows for Analytics Pipelines (DEV401)Amazon Web Services
Learn how to leverage new workflow management tools to simplify complex data pipelines and ETL jobs spanning multiple systems. In this technical deep dive from Treasure Data, company founder and chief architect walks through the codebase of DigDag, our recently open-sourced workflow management project. He shows how workflows can break large, error-prone SQL statements into smaller blocks that are easier to maintain and reuse. He also demonstrates how a system using ‘last good’ checkpoints can save hours of computation when restarting failed jobs and how to use standard version control systems like Github to automate data lifecycle management across Amazon S3, Amazon EMR, Amazon Redshift, and Amazon Aurora. Finally, you see a few examples where SQL-as-pipeline-code gives data scientists both the right level of ownership over production processes and a comfortable abstraction from the underlying execution engines. This session is sponsored by Treasure Data.
AWS Competency Partner
When the Cloud is a Rockin: High Availability in Apache CloudStackJohn Burwell
CloudStack currently provides a variety bespoke high availability mechanisms for resources such as virtual machines, hosts, and virtual routers. Each of these implementations duplicates the HA check/recovery cycle, as well as, concurrency, persistence, and clustering required manage high available for any CloudStack resource. The High Availability Resource Management Service has been developed to consolidate these concerns -- providing a robust, extensible HA mechanism. Using this service, plugins only need to define health check, activity check, and fence operations.
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...Amazon Web Services
Building big data applications often requires integrating a broad set of technologies to store, process, and analyze the increasing variety, velocity, and volume of data being collected by many organizations.
Using a combination of Amazon EMR, a managed Hadoop framework, and Amazon Redshift, a managed petabyte-scale data warehouse, organizations can effectively address many of these requirements.
In this webinar, we will show how organizations are using Amazon EMR and Amazon Redshift to build more agile and scalable architectures for big data. We will look into how you can leverage Spark and Presto running on EMR, to address multiple data processing requirements. We will also share best practices and common use cases to integrate EMR and Redshift.
Learning Objectives:
• Best practices for building a big data architecture that includes Amazon EMR and Amazon Redshift
• Understand how to use technologies such as Amazon EMR, Presto and Spark to complement your data warehousing environment
• Learn key use cases for Amazon EMR and Amazon Redshift
Who Should Attend:
• Data architects, Data management professionals, Data warehousing professionals, BI professionals
Project Ouroboros: Using StreamSets Data Collector to Help Manage the StreamS...Pat Patterson
On a typical day we see hundreds of downloads of StreamSets Data Collector, our open source data integration tool. We used to wrangle our download logs using a combination of the AWS S3 command line, sed, grep, awk and other tools, all run from a shell script (on my laptop!) once a week. This was a classic example of a brittle, hard to maintain, custom data integration. One day it dawned on me, "This is crazy, we have a tool that can do all this!". In this session, I'll explain how I built a dataflow pipeline to stream content delivery network (CDN) logs from S3 to MySQL in real-time, allowing us to gain valuable insights into our open source community. You'll also learn how we use the same techniques to not only gain insights into our community on Slack, but also build tools to better serve them.
Amazon Web Services gives you fast access to flexible and low cost IT resources, so you can rapidly scale and build virtually any big data and analytics application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing regardless of volume, velocity, and variety of data.
In this one-hour webinar, we will look at the portfolio of AWS Big Data services and how they can be used to build a modern data architecture.
We will cover:
Using different SQL engines to analyze large amounts of structured data
Analysing streaming data in near-real time
Architectures for batch processing
Best practices for Data Lake architectures
This session is suited for:
Solution and enterprise architects
Data architects/ Data warehouse owners
IT & Innovation team members
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
The world is creating more data in more ways than ever before. The average internet user in 2017 generates 1.5GB of data per day, with the rate doubling every 18 months. A single autonomous vehicle can generate 4TB per day. Each smart manufacturing plant generates 1PB per day. Storing, managing, and analyzing this data requires integrated database and analytic services that provide reliability and security at scale. AWS offers a range of managed data services that let customers focus on making data useful, including Amazon Aurora, RDS, DynamoDB, Redshift, Spectrum, ElastiCache, Kinesis, EMR, Elasticsearch Service, and Glue. In this session, we discuss these services, share our vision for innovation, and show how our customers use these services today. Learn More: https://aws.amazon.com/government-education/
FSI301 An Architecture for Trade Capture and Regulatory ReportingAmazon Web Services
For many securities organizations, post-trade processing is expensive, cumbersome, and time-consuming. This is in part due to the massive volumes of data required for processing a trade and the limited agility of the technology many organizations rely on today. In order to create efficiencies and move faster, many Financial Services organizations are working with AWS to implement post-trade solutions built with AWS’ storage services (S3 and Glacier) and big data capabilities (Athena, EMR, Redshift, and QuickSight ). In this session, AWS will walk through a trade capture and regulatory reporting solution that utilizes the aforementioned AWS services. We will also provide guidance around obtaining data-driven insights (from pixels to pictures), bolstering encryption with Amazon KMS, and maintaining transparency and control with Amazon CloudWatch and Amazon CloudTrail (which also helps meet SEC Rule 613 that requires the creation of comprehensive consolidated audit trails).
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...Amazon Web Services
Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), application programming interfaces (API), clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. Building scalable big data pipelines with automated extract-transform-load (ETL) and machine learning processes can address these limitations. JustGiving is the world’s largest social platform for online giving. In this session, we describe how we created several scalable and loosely coupled event-driven ETL and ML pipelines as part of our in-house data science platform called RAVEN. You learn how to leverage AWS Lambda, Amazon S3, Amazon EMR, Amazon Kinesis, and other services to build serverless, event-driven, data and stream processing pipelines in your organization. We review common design patterns, lessons learned, and best practices, with a focus on serverless big data architectures with AWS Lambda.
AWS re:Invent 2016: Cloud Monitoring - Understanding, Preparing, and Troubles...Amazon Web Services
Applications running in a typical data center are static entities. Dynamic scaling and resource allocation are the norm in AWS. Technologies such as Amazon EC2, Docker, AWS Lambda, and Auto Scaling make tracking resources and resource utilization a challenge. The days of static server monitoring are over.
In this session, we examine trends we’ve observed across thousands of customers using dynamic resource allocation and discuss why dynamic infrastructure fundamentally changes your monitoring strategy. We discuss some of the best practices we’ve learned by working with New Relic customers to build, manage, and troubleshoot applications and dynamic cloud services. Session sponsored by New Relic.
AWS Competency Partner
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleAmazon Web Services
"With cloud maturity comes operational efficiencies and endless potential for innovation and business growth. However, the complexities of governing cloud infrastructure are impeding without the right strategy. Visibility, accountability, and actionable insights are some of the most invaluable considerations. The AWS cloud clearly enables convenience and cost savings for organizations that know how to leverage its full potential. Amazon EC2 Reserved Instances (RIs) in particular, present a tremendous opportunity when scaling to save significantly on capacity but there are many considerations to fully reaping the benefits of RIs. In this session, CloudCheckr CTO Patrick Gartlan will present issues that every organization runs into when scaling, provide best practices for how to combat them and help you show your boss how RIs help you save money and move faster.
This session is brought to you by AWS Summit New York City sponsor, CloudCheckr. "
Building Continuously Curated Ingestion PipelinesArvind Prabhakar
Data ingestion is a critical piece of infrastructure for any Big Data project. Learn about the key challenges in building Ingestion infrastructure and how enterprises are solving them using low level frameworks like Apache Flume, Kafka, and high level systems such as StreamSets.
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
Big Data tools such as Hadoop and Spark allow you to process data at unprecedented scale, but keeping your processing engine fed can be a challenge. Upstream data sources can 'drift' due to infrastructure, OS and application changes, causing ETL tools and hand-coded solutions to fail, inducing heartburn in even the most resilient data scientist. This session will survey the big data ingestion landscape, focusing on how open source tools such as Sqoop, Flume, Nifi and StreamSets can keep the data pipeline flowing.
Building Data Pipelines with Spark and StreamSetsPat Patterson
Big data tools such as Hadoop and Spark allow you to process data at unprecedented scale, but keeping your processing engine fed can be a challenge. Metadata in upstream sources can ‘drift’ due to infrastructure, OS and application changes, causing ETL tools and hand-coded solutions to fail. StreamSets Data Collector (SDC) is an Apache 2.0 licensed open source platform for building big data ingest pipelines that allows you to design, execute and monitor robust data flows. In this session we’ll look at how SDC’s “intent-driven” approach keeps the data flowing, with a particular focus on clustered deployment with Spark and other exciting Spark integrations in the works.
Adaptive Data Cleansing with StreamSets and Cassandra (Pat Patterson, StreamS...DataStax
Cassandra is a perfect fit for consuming high volumes of time-series data directly from users, devices, and sensors. Sometimes, though, when we consume data from the real world, systematic and random errors creep in. In this session, we'll see how to use open source tools like RabbitMQ and StreamSets Data Collector with Cassandra features such as User Defined Aggregates to collect, cleanse and ingest variable quality data at scale. Discover how to combine the power of Cassandra with the flexibility of StreamSets to implement adaptive data cleansing.
About the Speaker
Pat Patterson Community Champion, StreamSets
Pat Patterson has been working with Internet technologies since 1997, building software and working with communities at Sun Microsystems, Huawei, Salesforce and StreamSets. At Sun, Pat was the community lead for OpenSSO, while at Huawei he developed cloud storage infrastructure software. A developer evangelist at Salesforce, Pat focused on identity, integration and IoT. Now community champion at StreamSets, Pat is responsible for the care and feeding of the StreamSets open source community.
Adaptive Data Cleansing with StreamSets and CassandraPat Patterson
Presented at Cassandra Summit 2016.
Cassandra is a perfect fit for consuming high volumes of time-series data directly from users, devices, and sensors. Sometimes, though, when we consume data from the real world, systematic and random errors creep in. In this session, we'll see how to use open source tools like RabbitMQ and StreamSets Data Collector with Cassandra features such as User Defined Aggregates to collect, cleanse and ingest variable quality data at scale. Discover how to combine the power of Cassandra with the flexibility of StreamSets to implement adaptive data cleansing.
A global survey of more than 300 data management professionals conducted by independent research firm Dimensional Research® showed that enterprises of all sizes face challenges on a range of key data performance management issues from stopping bad data to keeping data flows operating effectively. In particular, 87 percent of respondents report flowing bad data into their data stores while just 12 percent consider themselves good at the key aspects of data flow performance management.
This presentation is an attempt do demystify the practice of building reliable data processing pipelines. We go through the necessary pieces needed to build a stable processing platform: data ingestion, processing engines, workflow management, schemas, and pipeline development processes. The presentation also includes component choice considerations and recommendations, as well as best practices and pitfalls to avoid, most learnt through expensive mistakes.
In this talk, we provide an introduction to Python Luigi via real life case studies showing you how you can break large, multi-step data processing task into a graph of smaller sub-tasks that are aware of the state of their interdependencies.
Growth Intelligence tracks the performance and activity of all the companies in the UK economy using their data ‘footprint’. This involves tracking numerous unstructured data points from multiple sources in a variety of formats and transforming them into a standardised feature set we can use for building predictive models for our clients.
In the past, this data was collected by in a somewhat haphazard fashion: combining manual effort, ad hoc scripting and processing which was difficult to maintain. In order to streamline the data flows, we’re using an open-source Python framework from Spotify called Luigi. Luigi was created for managing task dependencies, monitoring the progress of the data pipeline and providing frameworks for common batch processing tasks.
UX, ethnography and possibilities: for Libraries, Museums and ArchivesNed Potter
These slides are adapted from a talk I gave at the Welsh Government's Marketing Awards for the LAM sector, in 2017.
It offers a primer on UX - User Experience - and how ethnography and design might be used in the library, archive and museum worlds to better understand our users. All good marketing starts with audience insight.
The presentation covers the following:
1) An introduction to UX
2) Ethnography, with definitions and examples of 7 ethnographic techniques
3) User-centred design and Design Thinking
4) Examples of UX-led changes made at institutions in the UK and Scandinavia
5) Next Steps - if you'd like to try out UX at your own organisation
The technologies and people we are designing experiences for are constantly changing, in most cases they are changing at a rate that is difficult keep up with. When we think about how our teams are structured and the design processes we use in light of this challenge, a new design problem (or problem space) emerges, one that requires us to focus inward. How do we structure our teams and processes to be resilient? What would happen if we looked at our teams and design process as IA’s, Designers, Researchers? What strategies would we put in place to help them be successful? This talk will look at challenges we face leading, supporting, or simply being a part of design teams creating experiences for user groups with changing technological needs.
Pivotal cloud cache for .net microservicesJagdish Mirani
In-memory caching is not new technology, but it takes on renewed significance with cloud-native, distributed application architectures. Modern day caching can alleviate the performance and availability challenges associated with cloud-native, distributed architectures.
This presentation explores the unique characteristics of modern, distributed application architectures that make caching a vital part of the solution.
With SPS 11 for the SAP HANA platform, some major additions to SAP HANA extended application services are planned. On the JavaScript side, we plan to add Google V8 and full support for Node.js. We also plan to add a standard Java runtime (TomEE). The deployment infrastructure is planned to replace the current repository for SAP HANA. Come and see the features of the deployment infrastructure and the new XS Advanced run times, how design-time objects will now be managed in GIT and how to utilize the new container concept.
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
Scalding is a scala DSL for Cascading. Run on Hadoop, it’s a concise, functional, and very efficient way to build big data applications. One significant benefit of Scalding is that it allows easy porting of Scalding apps from MapReduce to newer, faster execution fabrics.
In this webinar, Cyrille Chépélov, of Transparency Rights Management, will share how his organization boosted the performance of their Scalding apps by over 50% by moving away from MapReduce to Cascading 3.0 on Apache Tez. Dhruv Kumar, Hortonworks Partner Solution Engineer, will then explain how you can interact with data on HDP using Scala and leverage Scala as a programming language to develop Big Data applications.
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
Scalding is a scala DSL for Cascading. Run on Hadoop, it’s a concise, functional, and very efficient way to build big data applications. One significant benefit of Scalding is that it allows easy porting of Scalding apps from MapReduce to newer, faster execution fabrics.
In this webinar, Cyrille Chépélov, of Transparency Rights Management, will share how his organization boosted the performance of their Scalding apps by over 50% by moving away from MapReduce to Cascading 3.0 on Apache Tez. Dhruv Kumar, Hortonworks Partner Solution Engineer, will then explain how you can interact with data on HDP using Scala and leverage Scala as a programming language to develop Big Data applications.
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
Scalding is a scala DSL for Cascading. Run on Hadoop, it’s a concise, functional, and very efficient way to build big data applications. One significant benefit of Scalding is that it allows easy porting of Scalding apps from MapReduce to newer, faster execution fabrics.
In this webinar, Cyrille Chépélov, of Transparency Rights Management, will share how his organization boosted the performance of their Scalding apps by over 50% by moving away from MapReduce to Cascading 3.0 on Apache Tez. Dhruv Kumar, Hortonworks Partner Solution Engineer, will then explain how you can interact with data on HDP using Scala and leverage Scala as a programming language to develop Big Data applications.
Data Integration with Apache Kafka: What, Why, HowPat Patterson
Presented at Orange County Advanced Analytics and Big Data Meetup, June 21 2019.
Apache Kafka has fast become the dominant messaging technology for the enterprise; if you're a data scientist or data engineer and you have not yet worked with Kafka, that situation will likely change soon! In this session, Pat Patterson, director of evangelism at StreamSets, explains what Kafka is, why it has disrupted the previous generation of messaging products, and how you can use open source products to build dataflow pipelines with Kafka, without writing code.
Learn about the challenges the come with deploying and operating Kubernetes at scale and how the Mesosphere DC/OS Kubernetes integration helps solve them.
During this presentation, Joerg Schad discusses:
1. Common challenges associated with getting a Kubernetes cluster up and running
2. The basics of running Kubernetes on Mesosphere DC/OS
3. How failure recovery works with the DC/OS-Kubernetes solution
Azure + DataStax Enterprise Powers Office 365 Per User StoreDataStax Academy
We will present our O365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running DataStax Enterprise on azure.
Cloud Foundry Diego, Lattice, Docker and morecornelia davis
Colorado Cloud Foundry Meetup
May 19, 2015
Lattice and Docker with Cornelia Davis
Starting with a comparison of the current core runtime of the Cloud Foundry Elastic Runtime, to the new Diego rewrite, we take a tour through how linux containers can run a variety of image formats, including Docker. We talk about one way that you can get the Diego functionality in Lattice, a container scheduler that runs on a laptop or as a cluster in the cloud. We talk about ways of creating container images including Cloud Rocker and we draw it all together with a bunch of demos.
Abstract from the meetup:
What is Lattice (www.lattice.cf)?
Lattice is an open source project for running containerized workloads on a cluster. A Lattice cluster is comprised of a number of Lattice Cells (VMs that run containers) and a Lattice Coordinator that monitors the Cells.
Lattice includes built-in http load-balancing, a cluster scheduler, log aggregation with log streaming and health management.
Lattice containers are described as long-running processes or temporary tasks. Lattice includes support for Linux Containers expressed either as Docker Images or by composing applications as binary code on top of a root file system. Lattice's container pluggability will enable other backends such as Windows or Rocket in the future.
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
We will present our Office 365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running DSE on Azure.
The presentation will feature demos on how you too can build similar applications.
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023
https://www.meetup.com/futureofdata-princeton/events/293233881/
This is a repost from the Garden State Java Users Group Event.
Join me at
https://www.meetup.com/garden-state-java-user-group/events/293229660/
See: https://www.eventbrite.com/e/mastering-data-streaming-pipelines-tickets-627677218457?_ga=2.253257801.1787151623.1682868226-741104479.1678110925
Please note that registration via EventBrite is required to attend either in-person or online.
We are happy to announce that Tim Spann will be our special guest for the May 9, 2023 meeting!
Abstract:
In this session, Tim will show you some best practices that he has discovered over the last seven years in building data streaming applications including IoT, CDC, Logs, and more.
In his modern approach, we utilize several Apache frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Kafka. From there we build streaming ETL with Apache Flink, enhance events with NiFi enrichment. We build continuous queries against our topics with Flink SQL.
We will show where Java fits in as sources, enrichments, NiFi processors and sinks.
We hope to see you on May 9!
Speaker
Timothy Spann
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming.
Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
In this session, Tim will show you some best practices that he has discovered over the last seven years in building data streaming applications, including IoT, CDC, Logs, and more.
In his modern approach, we utilize several Apache frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Kafka. From there, we build streaming ETL with Apache Flink, enhance events with NiFi enrichment. We build continuous queries against our topics with Flink SQL.
We will show where Java fits in as sources, enrichments, NiFi processors, and sinks.
https://www.eventbrite.com/e/mastering-data-streaming-pipelines-tickets-627677218457?_ga=2.253257801.178
This is the talk I gave at the Seattle Spark Meetup in March, 2015. I discussed some Spark Streaming fundamentals, integration points with Kafka, Flume etc.
With the advent of new open source platforms around Hadoop, NoSQL databases & in-memory databases, the data management stack in the enterprise is undergoing complete re-platforming. Batch and stream processing are two distinct data processing paradigms that need to be supported over this new stack. In this session I will talk about the importance of having a unified batch and stream processing engine and share my learning around -
Sample use cases to that bring out the need to have a unified stream & batch processing engine
Important features needed in the unified platform to tackle the above use cases.
Cloudera Operational DB (Apache HBase & Apache Phoenix)Timothy Spann
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Using Apache NiFi 1.10 to read/write from HBase
Dec 2019, Timothy Spann, Field Engineer, Data in Motion
Princeton Meetup 10-dec-2019
https://www.meetup.com/futureofdata-princeton/events/266496424/
Hosted By PGA Fund at:
https://pga.fund/coworking-space/
Princeton Growth Accelerator
5 Independence Way, 4th Floor, Princeton, NJ
Similar to Logging infrastructure for Microservices using StreamSets Data Collector (20)
Introducing a horizontally scalable, inference-based business Rules Engine fo...Cask Data
Speaker: Nitin Motgi, Cask
Big Data Applications Meetup, 09/20/2017
Palo Alto, CA
More info here: http://www.meetup.com/BigDataApps/
Link to video: https://www.youtube.com/watch?v=FnQwDaKii2U
About the talk:
Business Rules are statements that describe business policies or procedures to process data. Rules engines or inference engines execute business rules in a runtime production environment, and have become commonplace for many IT applications. Except in the world of big data, where there has been a gap for a horizontally scalable, lightweight inference-based business rules engine for big data processing.
In this session, you learn about Cask’s new business Rule Rngine built on top of CDAP, which is a sophisticated if-then-else statement interpreter that runs natively on big data systems such as Spark, Hadoop, Amazon EMR, Azure HDInsight and GCE. It provides an alternative computational model for transforming your data while empowering business users to specify and manage the transformations and policy enforcements.
In his talk, Nitin Motgi, Cask co-founder and CTO, demonstrates this new, distributed rule engine and explain how business users in big data environments can make decisions on their data, enforce policies, and be an integral part of the data ingestion and ETL process. He also shows how business users can write, manage, deploy, execute and monitor business data transformation and policy enforcements.
Check out http://bdam.io/ for more info on the Big Data Apps meetup!
Transaction in HBase, by Andreas Neumann, CaskCask Data
Title: Transactions in HBase
Speaker: Andreas Neumann, Cask
ApacheCon Big Data, Miami, FL
May 18, 2017
Abstract:
In the age of NoSQL, big data storage engines such as HBase have given up ACID semantics of traditional relational databases, in exchange for high scalability and availability. However, it turns out that in practice, many applications require consistency guarantees to protect data from concurrent modification in a massively parallel environment. In the past few years, several transaction engines have been proposed as add-ons to HBase: Three different engines, namely Omid, Tephra, and Trafodion were open-sourced within the Apache ecosystem alone. In this talk, Andreas Neumann will introduce and compare the different approaches from various perspectives including scalability, efficiency, operability and portability, and make recommendations pertaining to different use cases.
Speaker Bio:
Andreas Neumann develops big data software at Cask, and has formerly done so at places that are known for massive scale. He was the chief architect for Hadoop at Yahoo! and also for the foundational content management system that Yahoo! built on Hadoop. Previously he was a research engineer at Yahoo! and a search architect at IBM. Andreas holds a doctoral degree in computer science for his work on querying XML documents.
#BDAM: EDW Optimization with Hadoop and CDAP, by Sagar Kapare from Cask Cask Data
Speaker: Sagar Kapare, Cask
Big Data Applications Meetup, 05/10/2017
Palo Alto, CA
More info here: http://www.meetup.com/BigDataApps/
Link to video: https://youtu.be/mSKwjKvYUtI
About the talk:
The cost of maintaining a traditional Enterprise Data Warehouse (EDW) is skyrocketing as legacy systems buckle under the weight of exponentially growing data and increasingly complex processing needs. Hadoop, with its massive horizontal scalability, and CDAP which offers pre-built pipelines for EDW Offload in a drag&drop studio environment, can help.
Sagar will demonstrate Cask’s solution, which shows how to build code-free, scalable, and enterprise-grade pipelines for delivering an easy-to-use and efficient EDW offload solution. He will also show how interactive data preparation, data pipeline automation, and fast querying capabilities over voluminous data can help unlock new use-cases.
"Who Moved my Data? - Why tracking changes and sources of data is critical to...Cask Data
Speaker: Russ Savage, from Cask
Big Data Applications Meetup, 09/14/2016
Palo Alto, CA
More info here: http://www.meetup.com/BigDataApps/
Link to talk: https://youtu.be/4j78g3WvC4Y
About the talk:
As data lake sizes grow, and more users begin exploring and including that data in their everyday analysis, keeping track of the sources for data becomes critical. Understanding how a dataset was generated and who is using it allows users and companies to ensure their analysis is leveraging the most accurate and up to date information. In this talk, we will explore the different techniques available to keep track of your data in your data lake and demonstrate how we at Cask approached and attempted to mitigate this issue.
Building Enterprise Grade Applications in Yarn with Apache TwillCask Data
Speaker: Poorna Chandra, from Cask
Big Data Applications Meetup, 07/27/2016
Palo Alto, CA
More info here: http://www.meetup.com/BigDataApps/
Link to talk: https://www.youtube.com/watch?v=I1GLRXyQlx8
About the talk:
Twill is an Apache incubator project that provides higher level abstraction to build distributed systems applications on YARN. Developing distributed applications using YARN is challenging because it does not provide higher level APIs, and lots of boiler plate code needs to be duplicated to deploy applications. Developing YARN applications is typically done by framework developers, like those familiar with Apache Flink or Apache Spark, who need to deploy the framework in a distributed way.
By using Twill, application developers need only be familiar with the basics of the Java programming model when using the Twill APIs, so they can focus on solving business problems. In this talk I present how Twill can be leveraged and an example of Cask Data Application Platform (CDAP) that heavily uses Twill for resource management.
Cask Webinar
Date: 08/10/2016
Link to video recording: https://www.youtube.com/watch?v=XUkANr9iag0
In this webinar, Nitin Motgi, CTO of Cask, walks through the new capabilities of CDAP 3.5 and explains how your organization can benefit.
Some of the highlights include:
- Enterprise-grade security - Authentication, authorization, secure keystore for storing configurations. Plus integration with Apache Sentry and Apache Ranger.
- Preview mode - Ability to preview and debug data pipelines before deploying them.
- Joins in Cask Hydrator - Capabilities to join multiple data sources in data pipelines
- Real-time pipelines with Spark Streaming - Drag & drop real-time pipelines using Spark Streaming.
- Data usage analytics - Ability to report application usage of data sets.
- And much more!
Transactions for Apache HBase™: Apache Tephra provides globally consistent transactions on top of Apache HBase. While HBase provides strong consistency with row- or region-level ACID operations, it sacrifices cross-region and cross-table consistency in favor of scalability. This trade-off requires application developers to handle the complexity of ensuring consistency when their modifications span region boundaries. By providing support for global transactions that span regions, tables, or multiple RPCs, Tephra simplifies application development on top of HBase, without a significant impact on performance or scalability for many workloads.
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...Cask Data
TITLE: ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating)
SPEAKER: Poorna Chandra, Cask Data
DATE: May 25, 2016
LOCATION: PhoenixCon, San Francisco CA
http://www.meetup.com/SF-Bay-Area-Apache-Phoenix-Meetup/events/230545182/
TALK ABSTRACT:
This talk is about how Apache Phoenix added support for ACID transactions using Apache Tephra™ (incubating), an open source transaction engine on top of Apache HBase. To start off with, the talk will examine the need for Phoenix data operations to be transactional. The talk will then discuss how Tephra implements transactional semantics using Optimistic Concurrency Control by giving an overview of the transactional model of Tephra along with the high level architecture. The talk will then describe the details of Phoenix integration with Tephra, and present performance some benchmark results for Phoenix operations with transactions. The talk will conclude with discussion on some challenges with scaling Tephra and potential solutions.
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 Cask Data
Speaker: Yuanchi Ning from Uber
Big Data Applications Meetup, 08/19/2015
Palo Alto, CA
More info here: http://www.meetup.com/BigDataApps/
Link to talk: https://www.youtube.com/watch?v=SY1YSU8cFLI
About the talk:
Athena is a stream processing platform for Uber's near real time analytics applications, built using Samza. We will be discussing some of the existing and upcoming use cases and how they impact the Uber partners / riders. The talk will go through the tooling built around Samza for easier user on-boarding - such as deployment manager, integration with typesafe config system, unit test framework, Graphite integration, metric whitelisting and so on. We'll also go over some of the issues observed during this process.
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015Cask Data
HBaseCon 2015
May 7
San Francisco
This talk at HBaseCon was given by Poorna Chandra from Cask and Alan Steckley from Salesforce.com
Here's a short summary of the talk:
Salesforce is building a new service, code-named Webhooks, that enables our customers' own systems to respond in near real-time to system events and customer behavioral actions from the Salesforce Marketing Cloud. The application should process millions of events per day to address the current needs and scale up to billions of events per day for future needs, so horizontal scalability is a primary concern. In this talk, they discussed how Webhooks is built using HBase for data storage and Cask Data Application Platform (CDAP), an open source framework for building applications on Hadoop.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Logging infrastructure for Microservices using StreamSets Data Collector
1. Logging infrastructure for MicroServices using StreamSets Data Collector
Logging Infrastructure for microservices using StreamSets
Data Collector
Presenter:
Virag Kothari
Software Engineer at StreamSets