Jie Chen, Manager Advisory, KPMG
Data is the new oil. However, many organizations have fragmented data in siloed line of businesses. In this topic, we will focus on identifying the legacy patterns and their limitations and introducing the new patterns packed by Kafka's core design ideas. The goal is to tirelessly pursue better solutions for organizations to overcome the bottleneck in data pipelines and modernize the digital assets for ready to scale their businesses. In summary, we will walk through three uses cases, recommend Dos and Donts, Take aways for Data Engineers, Data Scientist, Data architect in developing forefront data oriented skills.
Prezentace z webináře dne 10.3.2022
Prezentovali:
Jaroslav Malina - Senior Channel Sales Manager, Oracle
Josef Krejčí - Technology Sales Consultant, Oracle
Josef Šlahůnek - Cloud Systems sales Consultant, Oracle
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
The Future of Data Integration: Data Mesh, and a Special Deep Dive into Stream Processing with GoldenGate, Apache Kafka and Apache Spark. This video is a replay of a Live Webinar hosted on 03/19/2020.
Join us for a timely 45min webinar to see our take on the future of Data Integration. As the global industry shift towards the “Fourth Industrial Revolution” continues, outmoded styles of centralized batch processing and ETL tooling continue to be replaced by realtime, streaming, microservices and distributed data architecture patterns.
This webinar will start with a brief look at the macro-trends happening around distributed data management and how that affects Data Integration. Next, we’ll discuss the event-driven integrations provided by GoldenGate Big Data, and continue with a deep-dive into some essential patterns we see when replicating Database change events into Apache Kafka. In this deep-dive we will explain how to effectively deal with issues like Transaction Consistency, Table/Topic Mappings, managing the DB Change Stream, and various Deployment Topologies to consider. Finally, we’ll wrap up with a brief look into how Stream Processing will help to empower modern Data Integration by supplying realtime data transformations, time-series analytics, and embedded Machine Learning from within data pipelines.
GoldenGate: https://www.oracle.com/middleware/tec...
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Prezentace z webináře dne 10.3.2022
Prezentovali:
Jaroslav Malina - Senior Channel Sales Manager, Oracle
Josef Krejčí - Technology Sales Consultant, Oracle
Josef Šlahůnek - Cloud Systems sales Consultant, Oracle
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
The Future of Data Integration: Data Mesh, and a Special Deep Dive into Stream Processing with GoldenGate, Apache Kafka and Apache Spark. This video is a replay of a Live Webinar hosted on 03/19/2020.
Join us for a timely 45min webinar to see our take on the future of Data Integration. As the global industry shift towards the “Fourth Industrial Revolution” continues, outmoded styles of centralized batch processing and ETL tooling continue to be replaced by realtime, streaming, microservices and distributed data architecture patterns.
This webinar will start with a brief look at the macro-trends happening around distributed data management and how that affects Data Integration. Next, we’ll discuss the event-driven integrations provided by GoldenGate Big Data, and continue with a deep-dive into some essential patterns we see when replicating Database change events into Apache Kafka. In this deep-dive we will explain how to effectively deal with issues like Transaction Consistency, Table/Topic Mappings, managing the DB Change Stream, and various Deployment Topologies to consider. Finally, we’ll wrap up with a brief look into how Stream Processing will help to empower modern Data Integration by supplying realtime data transformations, time-series analytics, and embedded Machine Learning from within data pipelines.
GoldenGate: https://www.oracle.com/middleware/tec...
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
In this presentation, we show how Data Reply helped an Austrian fintech customer to overcome previous performance limitations in their data analytics landscape, leverage real-time pipelines, break down monoliths, and foster a self-service data culture to enable new event-driven and business-critical use cases.
Introducing Events and Stream Processing into Nationwide Building Societyconfluent
Watch this talk here: https://www.confluent.io/online-talks/introducing-events-and-stream-processing-nationwide-building-society
Open Banking regulations compel the UK’s largest banks, and building societies to enable their customers to share personal information with other regulated companies securely. As a result companies such as Nationwide Building Society are re-architecting their processes and infrastructure around customer needs to reduce the risk of losing relevance and the ability to innovate.
In this online talk, you will learn why, when facing Open Banking regulation and rapidly increasing transaction volumes, Nationwide decided to take load off their back-end systems through real-time streaming of data changes into Apache Kafka®. You will hear how Nationwide started their journey with Apache Kafka®, beginning with the initial use case of creating a real-time data cache using Change Data Capture, Confluent Platform and Microservices. Rob Jackson, Head of Application Architecture, will also cover how Confluent enabled Nationwide to build the stream processing backbone that is being used to re-engineer the entire banking experience including online banking, payment processing and mortgage applications.
View now to:
-Explore the technologies used by Nationwide to meet the challenges of Open Banking
-Understand how Nationwide is using KSQL and Kafka Streams Framework to join topics and process data.
-Learn how Confluent Platform can enable enterprises such as Nationwide to embrace the event streaming paradigm
-See a working demo of the Nationwide system and what happens when the underlying infrastructure breaks.
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...EMC
This white paper discusses the software-defined data center (SDCC) and challenges of heterogeneous storage silos in making SDDC a reality. It introduces EMC ViPR software-defined storage, which enables enterprise IT departments and service providers to transform physical storage arrays into simple, extensible, open virtual storage platform.
MuleSoft Meetup Singapore #8 March 2021Julian Douch
Best of Both - Kafka & MuleSoft. Two of the hottest technologies on the market for integrating and distributing data across the enterprise and with end-users of digital services. In this session Senthilkumar from MuleSoft will walkthrough these technologies, provide understanding to the use-cases where both can be leveraged in together to deliver the best digital services.
IOT & Enterprise Connectivity Using MuleSoft. There's no questioning that Internet of Things (IoT) creates tremendous business opportunities. According to its definition from the web, the Internet of Things (IoT) describes the network of physical objects - “things”- that are embedded with sensors, software, and other technologies for the purpose of connecting and exchanging data with other devices and systems over the internet. These devices range from ordinary household objects to sophisticated industrial tools. With more than 7 billion connected IoT devices today, experts are expecting this number to grow to 10 billion by 2020 and 22 billion by 2025. But how do these devices securely connect with your Enterprise Business Applications?
In this session you will learn fundamentals about IoT, architecture, devices and how we can leverage MuleSoft. Giap Hui Tan, Principal Consultant and IoT enthusiast, will be sharing his knowledge, discuss the architecture with application of IoT and MuleSoft which also comes with API and hardware demo.
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
Completely transform the way Cloud apps access data
Progress® DataDirect® Hybrid Data Pipeline is the industry’s first hybrid data pipeline that can run independently and integrate with any single or multi-vendor technology stack connected by open standards for SQL and REST.
Transforming Enterprise IT - Virtual Transformation Day Feb 2019Amazon Web Services
Speaker: Wesley Wilks, Dan Gallivan
As more and more enterprises start down the path of their digital transformation, the pressure on their IT organizations to support innovation across the business couldn’t be higher. In this session, we will outline a number of cutting-edge technologies as well as an operating model that will allow IT to position itself as a business enabler and not a blocker. We will be sharing some mechanisms that will enable the IT organization to meet the pace of innovation that is being set by the business while giving them the flexibility to leverage existing assets.
AWS Transformation Day is designed for enterprise organizations looking to make the move to the cloud in order to become more responsive, agile and innovative, while still staying secure and compliant. Join us for this virtual event and we'll share our experiences of helping enterprise customers accelerate the pace of migration and adoption of strategic services.
We recommend this event for IT and business leaders who are looking to create sustainable benefits and a competitive advantage by using the AWS Cloud.
Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent
Facing Open Banking regulation, rapidly increasing transaction volumes and increasing customer expectations, Nationwide took the decision to take load off their back-end systems through real-time streaming of data changes into Kafka. Hear about how Nationwide started their journey with Kafka, from their initial use case of creating a real-time data cache using Change Data Capture, Kafka and Microservices to how Kafka allowed them to build a stream processing backbone used to reengineer the entire banking experience including online banking, payment processing and mortgage applications. See a working demo of the system and what happens to the system when the underlying infrastructure breaks. Technologies covered include: Change Data Capture, Kafka (Avro, partitioning and replication) and using KSQL and Kafka Streams Framework to join topics and process data.
Introductory session on Cloud Computing Technology and Current market offerings from various vendors.
Evolving of Cloud base platform, Need of Cloud and Benefits with its some limitations and current challenges.
Learn about IBM's Hadoop offering called BigInsights. We will look at the new features in version 4 (including a discussion on the Open Data Platform), review a couple of customer examples, talk about the overall offering and differentiators, and then provide a brief demonstration on how to get started quickly by creating a new cloud instance, uploading data, and generating a visualization using the built-in spreadsheet tooling called BigSheets.
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
How and why are companies like Uber, Netflix and AirBnB so successful, what you need to in order to become successful in the same way that they are and how Pivotal can help you with that.
Speaker: Les Klein, EMEA CTO Data, Pivotal
Amazon Managed Blockchain and Quantum Ledger Database QLDBJohn Yeung
Introduce two new products: Amazon Managed Blockchain and Quantum Ledger Database QLDB. Explain the key differences between these products and how to choose them for different scenarios. Amazon Managed Blockchain is a managed services supporting two key Blockchain frameworks, namely Hyperledger Fabric and Ethereum, while QLDB aims to provide a managed database service with the immutable and verifiable capabilities. Can they be working together to achieve a higher value? Let's read the document.
In this presentation, we show how Data Reply helped an Austrian fintech customer to overcome previous performance limitations in their data analytics landscape, leverage real-time pipelines, break down monoliths, and foster a self-service data culture to enable new event-driven and business-critical use cases.
Introducing Events and Stream Processing into Nationwide Building Societyconfluent
Watch this talk here: https://www.confluent.io/online-talks/introducing-events-and-stream-processing-nationwide-building-society
Open Banking regulations compel the UK’s largest banks, and building societies to enable their customers to share personal information with other regulated companies securely. As a result companies such as Nationwide Building Society are re-architecting their processes and infrastructure around customer needs to reduce the risk of losing relevance and the ability to innovate.
In this online talk, you will learn why, when facing Open Banking regulation and rapidly increasing transaction volumes, Nationwide decided to take load off their back-end systems through real-time streaming of data changes into Apache Kafka®. You will hear how Nationwide started their journey with Apache Kafka®, beginning with the initial use case of creating a real-time data cache using Change Data Capture, Confluent Platform and Microservices. Rob Jackson, Head of Application Architecture, will also cover how Confluent enabled Nationwide to build the stream processing backbone that is being used to re-engineer the entire banking experience including online banking, payment processing and mortgage applications.
View now to:
-Explore the technologies used by Nationwide to meet the challenges of Open Banking
-Understand how Nationwide is using KSQL and Kafka Streams Framework to join topics and process data.
-Learn how Confluent Platform can enable enterprises such as Nationwide to embrace the event streaming paradigm
-See a working demo of the Nationwide system and what happens when the underlying infrastructure breaks.
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...EMC
This white paper discusses the software-defined data center (SDCC) and challenges of heterogeneous storage silos in making SDDC a reality. It introduces EMC ViPR software-defined storage, which enables enterprise IT departments and service providers to transform physical storage arrays into simple, extensible, open virtual storage platform.
MuleSoft Meetup Singapore #8 March 2021Julian Douch
Best of Both - Kafka & MuleSoft. Two of the hottest technologies on the market for integrating and distributing data across the enterprise and with end-users of digital services. In this session Senthilkumar from MuleSoft will walkthrough these technologies, provide understanding to the use-cases where both can be leveraged in together to deliver the best digital services.
IOT & Enterprise Connectivity Using MuleSoft. There's no questioning that Internet of Things (IoT) creates tremendous business opportunities. According to its definition from the web, the Internet of Things (IoT) describes the network of physical objects - “things”- that are embedded with sensors, software, and other technologies for the purpose of connecting and exchanging data with other devices and systems over the internet. These devices range from ordinary household objects to sophisticated industrial tools. With more than 7 billion connected IoT devices today, experts are expecting this number to grow to 10 billion by 2020 and 22 billion by 2025. But how do these devices securely connect with your Enterprise Business Applications?
In this session you will learn fundamentals about IoT, architecture, devices and how we can leverage MuleSoft. Giap Hui Tan, Principal Consultant and IoT enthusiast, will be sharing his knowledge, discuss the architecture with application of IoT and MuleSoft which also comes with API and hardware demo.
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
Completely transform the way Cloud apps access data
Progress® DataDirect® Hybrid Data Pipeline is the industry’s first hybrid data pipeline that can run independently and integrate with any single or multi-vendor technology stack connected by open standards for SQL and REST.
Transforming Enterprise IT - Virtual Transformation Day Feb 2019Amazon Web Services
Speaker: Wesley Wilks, Dan Gallivan
As more and more enterprises start down the path of their digital transformation, the pressure on their IT organizations to support innovation across the business couldn’t be higher. In this session, we will outline a number of cutting-edge technologies as well as an operating model that will allow IT to position itself as a business enabler and not a blocker. We will be sharing some mechanisms that will enable the IT organization to meet the pace of innovation that is being set by the business while giving them the flexibility to leverage existing assets.
AWS Transformation Day is designed for enterprise organizations looking to make the move to the cloud in order to become more responsive, agile and innovative, while still staying secure and compliant. Join us for this virtual event and we'll share our experiences of helping enterprise customers accelerate the pace of migration and adoption of strategic services.
We recommend this event for IT and business leaders who are looking to create sustainable benefits and a competitive advantage by using the AWS Cloud.
Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent
Facing Open Banking regulation, rapidly increasing transaction volumes and increasing customer expectations, Nationwide took the decision to take load off their back-end systems through real-time streaming of data changes into Kafka. Hear about how Nationwide started their journey with Kafka, from their initial use case of creating a real-time data cache using Change Data Capture, Kafka and Microservices to how Kafka allowed them to build a stream processing backbone used to reengineer the entire banking experience including online banking, payment processing and mortgage applications. See a working demo of the system and what happens to the system when the underlying infrastructure breaks. Technologies covered include: Change Data Capture, Kafka (Avro, partitioning and replication) and using KSQL and Kafka Streams Framework to join topics and process data.
Introductory session on Cloud Computing Technology and Current market offerings from various vendors.
Evolving of Cloud base platform, Need of Cloud and Benefits with its some limitations and current challenges.
Learn about IBM's Hadoop offering called BigInsights. We will look at the new features in version 4 (including a discussion on the Open Data Platform), review a couple of customer examples, talk about the overall offering and differentiators, and then provide a brief demonstration on how to get started quickly by creating a new cloud instance, uploading data, and generating a visualization using the built-in spreadsheet tooling called BigSheets.
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
How and why are companies like Uber, Netflix and AirBnB so successful, what you need to in order to become successful in the same way that they are and how Pivotal can help you with that.
Speaker: Les Klein, EMEA CTO Data, Pivotal
Amazon Managed Blockchain and Quantum Ledger Database QLDBJohn Yeung
Introduce two new products: Amazon Managed Blockchain and Quantum Ledger Database QLDB. Explain the key differences between these products and how to choose them for different scenarios. Amazon Managed Blockchain is a managed services supporting two key Blockchain frameworks, namely Hyperledger Fabric and Ethereum, while QLDB aims to provide a managed database service with the immutable and verifiable capabilities. Can they be working together to achieve a higher value? Let's read the document.
Similar to Data Con LA 2022 - Data Streaming with Kafka (20)
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
Mike Limcaco, Analytics Specialist / Customer Engineer at Google
Measure trends in a particular topic or search term across Google Search across the US down to the city-level. Integrate these data signals into analytic pipelines to drive product, retail, media (video, audio, digital content) recommendations tailored to your audience segment. We'll discuss how Google unique datasets can be used with Google Cloud smart analytic services to process, enrich and surface the most relevant product or content that matches the ever-changing interests of your local customer segment.
Melinda Thielbar, Data Science Practice Lead and Director of Data Science at Fidelity Investments
From corporations to governments to private individuals, most of the AI community has recognized the growing need to incorporate ethics into the development and maintenance of AI models. Much of the current discussion, though, is meant for leaders and managers. This talk is directed to data scientists, data engineers, ML Ops specialists, and anyone else who is responsible for the hands-on, day-to-day of work building, productionalizing, and maintaining AI models. We'll give a short overview of the business case for why technical AI expertise is critical to developing an AI Ethics strategy. Then we'll discuss the technical problems that cause AI models to behave unethically, how to detect problems at all phases of model development, and the tools and techniques that are available to support technical teams in Ethical AI development.
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
Antje Barth, Principal Developer Advocate, AI/ML at AWS & Chris Fregly, Principal Engineer, AI & ML at AWS
The frequency and severity of natural disasters are increasing. In response, governments, businesses, nonprofits, and international organizations are placing more emphasis on disaster preparedness and response. Many organizations are accelerating their efforts to make their data publicly available for others to use. Repositories such as the Registry of Open Data on AWS and Humanitarian Data Exchange contain troves of data available for use by developers, data scientists, and machine learning practitioners. In this session, see how a community of developers came together though the AWS Disaster Response hackathon to build models to support natural disaster preparedness and response.
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
Sig Narvaez, Executive Solution Architect at MongoDB
MongoDB is now a Developer Data Platform. Come learn what�s new in the 6.0 release and Atlas following all the recent announcements made at MongoDB World 2022. Topics will include
- Atlas Search which combines 3 systems into one (database, search engine, and sync mechanisms) letting you focus on your product's differentiation.
- Atlas Data Federation to seamlessly query, transform, and aggregate data from one or more MongoDB Atlas databases, Atlas Data Lake and AWS S3 buckets
- Queryable Encryption lets you run expressive queries on fully randomized encrypted data to meet the most stringent security requirements
- Relational Migrator which analyzes your existing relational schemas and helps you design a new MongoDB schema.
- And more!
Data Con LA 2022 - Real world consumer segmentationData Con LA
Jaysen Gillespie, Head of Analytics and Data Science at RTB House
1. Shopkick has over 30M downloads, but the userbase is very heterogeneous. Anecdotal evidence indicated a wide variety of users for whom the app holds long-term appeal.
2. Marketing and other teams challenged Analytics to get beyond basic summary statistics and develop a holistic segmentation of the userbase.
3. Shopkick's data science team used SQL and python to gather data, clean data, and then perform a data-driven segmentation using a k-means algorithm.
4. Interpreting the results is more work -- and more fun -- than running the algo itself. We'll discuss how we transform from ""segment 1"", ""segment 2"", etc. to something that non-analytics users (Marketing, Operations, etc.) could actually benefit from.
5. So what? How did team across Shopkick change their approach given what Analytics had discovered.
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
Ravi Pillala, Chief Data Architect & Distinguished Engineer at Intuit
TurboTax is one of the well known consumer software brand which at its peak serves 385K+ concurrent users. In this session, We start with looking at how user behavioral data & tax domain events are captured in real time using the event bus and analyzed to drive real time personalization with various TurboTax data pipelines. We will also look at solutions performing analytics which make use of these events, with the help of Kafka, Apache Flink, Apache Beam, Spark, Amazon S3, Amazon EMR, Redshift, Athena and Amazon lambda functions. Finally, we look at how SageMaker is used to create the TurboTax model to predict if a customer is at risk or needs help.
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
George Mansoor, Chief Information Systems Officer at California State University
Overview of the CSU Data Architecture on moving on-prem ERP data to the AWS Cloud at scale using Delphix for Data Replication/Virtualization and AWS Data Migration Service (DMS) for data extracts
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
Anand Ranganathan, Chief AI Officer at Unscrambl
Conversational AI is getting more and more widely used for customer support and employee support use-cases. In this session, I'm going to talk about how it can be extended for data analysis and data science use-cases ... i.e., how users can interact with a bot to ask analytical questions on data in relational databases.
This allows users to explore complex datasets using a combination of text and voice questions, in natural language, and then get back results in a combination of natural language and visualizations. Furthermore, it allows collaborative exploration of data by a group of users in a channel in platforms like Microsoft Teams, Slack or Google Chat.
For example, a group of users in a channel can ask questions to a bot in plain English like ""How many cases of Covid were there in the last 2 months by state and gender"" or ""Why did the number of deaths from Covid increase in May 2022"", and jointly look at the results that come back. This facilitates data awareness, data-driven collaboration and joint decision making among teams in enterprises and outside.
In this talk, I'll describe how we can bring together various features including natural-language understanding, NL-to-SQL translation, dialog management, data story-telling, semantic modeling of data and augmented analytics to facilitate collaborate exploration of data using conversational AI.
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
Anil Inamdar, VP & Head of Data Solutions at Instaclustr
The most modernized enterprises utilize polyglot architecture, applying the best-suited database technologies to each of their organization's particular use cases. To successfully implement such an architecture, though, you need a thorough knowledge of the expansive NoSQL data technologies now available.
Attendees of this Data Con LA presentation will come away with:
-- A solid understanding of the decision-making process that should go into vetting NoSQL technologies and how to plan out their data modernization initiatives and migrations.
-- They will learn the types of functionality that best match the strengths of NoSQL key-value stores, graph databases, columnar databases, document-type databases, time-series databases, and more.
-- Attendees will also understand how to navigate database technology licensing concerns, and to recognize the types of vendors they'll encounter across the NoSQL ecosystem. This includes sniffing out open-core vendors that may advertise as “open source,"" but are driven by a business model that hinges on achieving proprietary lock-in.
-- Attendees will also learn to determine if vendors offer open-code solutions that apply restrictive licensing, or if they support true open source technologies like Hadoop, Cassandra, Kafka, OpenSearch, Redis, Spark, and many more that offer total portability and true freedom of use.
Data Con LA 2022 - Intro to Data ScienceData Con LA
Zia Khan, Computer Systems Analyst and Data Scientist at LearningFuze
Data Science tutorial is designed for people who are new to Data Science. This is a beginner level session so no prior coding or technical knowledge is required. Just bring your laptop with WiFi capability. The session starts with a review of what is data science, the amount of data we generate and how companies are using that data to get insight. We will pick a business use case, define the data science process, followed by hands-on lab using python and Jupyter notebook. During the hands-on portion we will work with pandas, numpy, matplotlib and sklearn modules and use a machine learning algorithm to approach the business use case.
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
Mariana Danilovic, Managing Director at Infiom, LLC
We will address:
(1) Community creation and engagement using tokens and NFTs
(2) Organization of DAO structures and ways to incentivize Web3 communities
(3) DeFi business models applied to Web3 ventures
(4) Why Metaverse matters for new entertainment and community engagement models.
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
Curtis ODell, Global Director Data Integrity at Tricentis
Join me to learn about a new end-to-end data testing approach designed for modern data pipelines that fills dangerous gaps left by traditional data management tools—one designed to handle structured and unstructured data from any source. You'll hear how you can use unique automation technology to reach up to 90 percent test coverage rates and deliver trustworthy analytical and operational data at scale. Several real world use cases from major banks/finance, insurance, health analytics, and Snowflake examples will be presented.
Key Learning Objective
1. Data journeys are complex and you have to ensure integrity of the data end to end across this journey from source to end reporting for compliance
2. Data Management tools do not test data, they profile and monitor at best, and leave serious gaps in your data testing coverage
3. Automation with integration to DevOps and DataOps' CI/CD processes are key to solving this.
4. How this approach has impact in your vertical
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
Arif Ansari, Professor at University of Southern California
Super Bowl Ad cost $7 million and each year a few Super Bowl ads go viral. The traditional A/B testing does not predict virality. Some highly shared ones reach over 60 million organic views, which can be more valuable than views on TV. Not only are these voluntary, but they are typically without distraction, and win viewer engagement in the form of likes, comments, or shares. A Super Bowl ad that wins 69 million views on YouTube (e.g., Alexa Mind Reader) costs less than 10 cents per quality view! However, the challenge is triggering virality. We developed a method to predict virality and engineer virality into Ads.
1. Prof. Gerard J. Tellis and co-authors recommended that advertisers use YouTube to tease, test, and tweak (TTT) their ads to maximize sharing and viewing. 2022 saw that maxim put into practice.
2. We developed viral Ads prediction using two scientific models:
a. Prof. Gerard Tellis et al.'s model for viral prediction
b. Deep Learning viral prediction using social media effect
3. The model was able to identify all the top 15 Viral Ads it performed better than the traditional agencies.
4. New proposed method is Tease, Test, Tweak, Target and Spots Ad.
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
Jai Bansal, Senior Manager, Data Science at Aetna
This talk describes an internal data product called Member Embeddings that facilitates modeling of member medical journeys with machine learning.
Medical claims are the key data source we use to understand health journeys at Aetna. Claims are the data artifacts that result from our members' interactions with the healthcare system. Claims contain data like the amount the provider billed, the place of service, and provider specialty. The primary medical information in a claim is represented in codes that indicate the diagnoses, procedures, or drugs for which a member was billed. These codes give us a semi-structured view into the medical reason for each claim and so contain rich information about members' health journeys. However, since the codes themselves are categorical and high-dimensional (10K cardinality), it's challenging to extract insight or predictive power directly from the raw codes on a claim.
To transform claim codes into a more useful format for machine learning, we turned to the concept of embeddings. Word embeddings are widely used in natural language processing to provide numeric vector representations of individual words.
We use a similar approach with our claims data. We treat each claim code as a word or token and use embedding algorithms to learn lower-dimensional vector representations that preserve the original high-dimensional semantic meaning.
This process converts the categorical features into dense numeric representations. In our case, we use sequences of anonymized member claim diagnosis, procedure, and drug codes as training data. We tested a variety of algorithms to learn embeddings for each type of claim code.
We found that the trained embeddings showed relationships between codes that were reasonable from the point of view of subject matter experts. In addition, using the embeddings to predict future healthcare-related events outperformed other basic features, making this tool an easy way to improve predictive model performance and save data scientist time.
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA
Xuanzi Han, Senior Software Engineer at Monte Carlo
For modern data teams, lineage is a critical component of the data pipeline root cause and impact analysis workflow, as well as a means of ensuring that data, models, and other data assets are healthy and reliable. That being said, the complexity of SQL queries can make it challenging to build lineage manually, particularly at the field level. Xuanzi Han, a member of Monte Carlo's data and product teams, tackled this challenge head-on by leveraging some of the most popular tools in the modern data stack, including dbt, Airflow, Snowflake, and ANother Tool for Language Recognition (ANTLR). In this talk, they share how they designed the data model, query parser, and larger database design for field-level lineage, highlighting learnings, wrong turns, and best practices developed along the way.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
17. What a Banking Institute’s need to modernize its Legacy System
A Banking Institute has been looking to migrate its legacy system to modern technologies that accelerate fast growing demand
in big data through building a modern data streaming platform as part of Business Operation Brain (BOB).
Reuse the existing data centers, storage, infrastructure and security procedures
Scalable and reliable (million transactions/events per second) with the existing infrastructure
Data must be logged for transaction tracing and auditing (For example, Change Data Capture)
18. What options we have: Kafka and Its Comparables in the marketplace
Not an inclusive options