Abstract:
Join Nick Piette, Director of Evangelism at Talend, as he brings you a deep, technical discussion on the real-world data pipeline that underlies modern sports. Working from real-time instrumentation data collected during play, and using open source tools, Nick will show you how to produce meaningful analytics results in minutes. If you are using Kafka, Spark, or any real-time data science technologies, or even if you are just trying to get a better understanding of them, this event is for you.
Speaker’s bio:
Nick Piette is the Director of Evangelism at Talend. He has spent the last eight years helping enterprises with many different data processing challenges. Nick enjoys sharing the most compelling big data use cases that are changing the world.
Goertek’s Experience with the Qualcomm Virtual Reality (VR) Accelerator ProgramAugmentedWorldExpo
A talk from the Develop Track at AWE USA 2018 - the World's #1 XR Conference & Expo in Santa Clara, California May 30- June 1, 2018.
Goertek’s Experience with the Qualcomm Virtual Reality (VR) Accelerator Program with
Said Bakadir (Qualcomm)
Allen Chien (Goertek)
Qualcomm Technologies announced the VR HMD Accelerator Program to help device manufacturers quickly develop premium standalone VR HMDs. Goertek is Qualcomm’s primary original device manufacturer (ODM) partner and has developed multiple generations of VR HMD reference designs. This talk will share Goertek’s experiences with Qualcomm Technologies’ HMD Accelerator program and demonstrate how the program enables OEMs to improve their overall development experience and shorten time to commercialization. This program allows them to focus on their own customizations and content while leveraging Goertek’s engineering, design, and manufacturing experience in VR.
Qualcomm Technologies announced the VR HMD Accelerator Program to help device manufacturers quickly develop premium standalone VR HMDs. Goertek is Qualcomm’s primary original device manufacturer (ODM) partner and has developed multiple generations of VR HMD reference designs. This talk will share Goertek’s experiences with Qualcomm Technologies’ HMD Accelerator program and demonstrate how the program enables OEMs to improve their overall development experience and shorten time to commercialization. This program allows them to focus on their own customizations and content while leveraging Goertek’s engineering, design, and manufacturing experience in VR.
http://AugmentedWorldExpo.com
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBMRedis Labs
The Linear Road benchmark was devised in 2004 to
compare Stream Data Management Systems. Walmart selected Linear Road to compare performance of streaming analytic
offerings. IBM implemented the benchmark application using Redis to maintain state, and IBM Streams to handle the
incoming events and queries. Walmart had to completely revamp the data drivers and test verification to take advantage
of multicore multithreaded servers available today. Tests were run on Microsoft Azure cloud to ensure fair comparison of
vendors. Redis and IBM Streams handled nearly 1 billion events in a 3 hour test on a single 16 core Azure node, and 3.8 billion
when scaled out to 4 nodes. Come learn about the application and near linear scalability of Redis and IBM Streams.
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
Learn how AWS can help you process and make better use of your data with meaningful insights.
Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
Learn about real time data processing with Amazon Kinesis.
Goertek’s Experience with the Qualcomm Virtual Reality (VR) Accelerator ProgramAugmentedWorldExpo
A talk from the Develop Track at AWE USA 2018 - the World's #1 XR Conference & Expo in Santa Clara, California May 30- June 1, 2018.
Goertek’s Experience with the Qualcomm Virtual Reality (VR) Accelerator Program with
Said Bakadir (Qualcomm)
Allen Chien (Goertek)
Qualcomm Technologies announced the VR HMD Accelerator Program to help device manufacturers quickly develop premium standalone VR HMDs. Goertek is Qualcomm’s primary original device manufacturer (ODM) partner and has developed multiple generations of VR HMD reference designs. This talk will share Goertek’s experiences with Qualcomm Technologies’ HMD Accelerator program and demonstrate how the program enables OEMs to improve their overall development experience and shorten time to commercialization. This program allows them to focus on their own customizations and content while leveraging Goertek’s engineering, design, and manufacturing experience in VR.
Qualcomm Technologies announced the VR HMD Accelerator Program to help device manufacturers quickly develop premium standalone VR HMDs. Goertek is Qualcomm’s primary original device manufacturer (ODM) partner and has developed multiple generations of VR HMD reference designs. This talk will share Goertek’s experiences with Qualcomm Technologies’ HMD Accelerator program and demonstrate how the program enables OEMs to improve their overall development experience and shorten time to commercialization. This program allows them to focus on their own customizations and content while leveraging Goertek’s engineering, design, and manufacturing experience in VR.
http://AugmentedWorldExpo.com
Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBMRedis Labs
The Linear Road benchmark was devised in 2004 to
compare Stream Data Management Systems. Walmart selected Linear Road to compare performance of streaming analytic
offerings. IBM implemented the benchmark application using Redis to maintain state, and IBM Streams to handle the
incoming events and queries. Walmart had to completely revamp the data drivers and test verification to take advantage
of multicore multithreaded servers available today. Tests were run on Microsoft Azure cloud to ensure fair comparison of
vendors. Redis and IBM Streams handled nearly 1 billion events in a 3 hour test on a single 16 core Azure node, and 3.8 billion
when scaled out to 4 nodes. Come learn about the application and near linear scalability of Redis and IBM Streams.
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
Learn how AWS can help you process and make better use of your data with meaningful insights.
Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
Learn about real time data processing with Amazon Kinesis.
Ovh analytics data compute with apache spark as a service meetup ovh bordeauxMojtaba Imani
90% of the data in the world today has been created in the last two years. The world will be creating 163 zettabytes of data a year by 2025. So how do we want to process this volume of data?
Apache Spark is an open-source distributed general-purpose cluster computing framework that is trending today. But the problem is that how to create a computing cluster fast and efficient? Should I do all network configuration and cluster management myself? What should I do with my cluster if I don't need it anymore? Is my cluster secure?
After discovering Apache Spark principles and use cases, you will discover OVH Analytics Data Compute. A fast, secure, and efficient Spark Cluster as a Service which is going to give answers to all these questions.
OVH Analytics Data Compute - Apache Spark Cluster as a ServiceOVHcloud
You need Apache Spark computation over a big Apache Spark cluster but you don't have computers ?
You don't have enough time to create a cluster of computers and do all installations and configurations ?
You just need a cluster for few hours and not forever ?
Or you just want to try out easily the power of Apache Spark ? Discover OVH Analytics Data Compute!
How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB
Sizing a database cluster makes or breaks your application. Too small and you could sustain spikes in usage and recover from a node loss or an operational slowdown. Too big and your cluster will cost more and waste valuable human resources.
Since different workloads have different requirements, successful sizing of your application should be optimized for both throughput and latency performance. However, in many cases, the requirements for each contradicts each other.
In this webinar, we explain how to remediate the contradicting forces and build a sustainable cluster to meet both performance and resiliency requirements.
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Dataconomy Media
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can Speed up the World"
Bio:
Ronan Corkery is a kdb+ engineer who has been working with Kx and First Derivatives for the past 4 years. Currently based in Total Gas and Power he spent his first 2 year working with Morgan Stanley.
Abstract:
Ronan's presentation will focus on the vertical industries the formally only finance based technologies Kx offers has been moving into. He will present proven solutions as well as introducing the overall architecture that Kx uses as well as laying out potential opportunities to work with Kx.
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Maya Lumbroso
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can Speed up the World"
Bio:
Ronan Corkery is a kdb+ engineer who has been working with Kx and First Derivatives for the past 4 years. Currently based in Total Gas and Power he spent his first 2 year working with Morgan Stanley.
Abstract:
Ronan's presentation will focus on the vertical industries the formally only finance based technologies Kx offers has been moving into. He will present proven solutions as well as introducing the overall architecture that Kx uses as well as laying out potential opportunities to work with Kx.
Tech Talk: Moneyball - Hitting real-time apps out of the park with Big MemoryMemVerge
A webinar hosted by MemVerge, Intel, NVIDIA, and The Next Platform. Timothy Prickett Morgan, co-editor of The Next Platform, provides his view of the Big Memory category. Mark DeMarseilles of Intel gives an update covering new Optane Persistent Memory Series 200. Rob Davis of NVIDIA explains why Big Memory needs low latency networks to distribute messages, to replicate data, and for high-availability, all without jitter. The Charles Fan of MemVerge describes Memory Machine software and different use cases including faster crash recovery, higher VM density, and high-frequency trading.
Modern Data Stack for Game Analytics / Dmitry Anoshin (Microsoft Gaming, The ...DevGAMM Conference
Talk will cover the journey of data platform design and implement for game analytics industry. I will tell about modern data stack. What tools and approaches are available on the market and how leading game companies engineer the data analytics solution and make better games with data insights.
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
Modern cars produce data. Lots of data. And Formula 1 cars produce more than their share. I will present a working demonstration of how modern data streaming can be applied to the data acquisition and analysis problem posed by modern motorsports.
Instead of bringing multiple Formula 1 cars to the talk, I will show how we instrumented a high fidelity physics-based automotive simulator to produce realistic data from simulated cars running on the Spa-Francorchamps track. We move data from the cars, to the pits, to the engineers back at HQ.
The result is near real-time visualization and comparison of performance and a great exposition of how to move data using messaging systems like Kafka, and process data in real time with Apache Spark, then analyse data using SQL with Apache Drill.
Code available here: https://github.com/mapr-demos/racing-time-series
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingKai Wähner
Business digitalization trends like microservices, the Internet of Things or Machine Learning are driving the need to process events at a whole new scale, speed and efficiency. Traditional solutions like ETL/data integration or messaging are not build to serve these needs.
Today, the open source project Apache Kafka® is being used by thousands of companies including over 60% of the Fortune 100 to power and innovate their businesses by focusing their data strategies around event-driven architectures leveraging event streaming.We will discuss the market and technology changes that have given rise to Kafka and to Event Streaming, and we will introduce the audience to the key aspects of building an Event streaming platform with Kafka. Examples of productive use cases from the automotive, manufacturing and transportation sector will showcase the power of event streaming.
Transforming the Database: Critical Innovations for Performance at ScaleScyllaDB
Your team is serious about ensuring database performance at scale. But legacy NoSQL technology could be eroding the impact of your achievements.
Following best practices for efficient data modeling, query optimization, and observability is fundamental. But their power can be limited – or lifted – by specific database capabilities. Often-overlooked database innovations can serve as a force multiplier, paving a much smoother path to speed at scale (e.g., millions of read/write operations and millisecond P99 response).
This webinar provides a technical deep dive into several such database innovations. ScyllaDB engineers will provide an inside look at innovations dev teams are using to:
- Squeeze every ounce of performance from modern cloud infrastructure
- Accommodate volatile traffic without overprovisioning
- Gain the advantage of external caching without the associated hassle and risks
- Prioritize the performance of latency-sensitive transactional workloads over higher throughput analytics workloads in the same cluster
The increasing demand for computing power in fields such as biology, finance, machine learning is pushing the adoption of reconfigurable hardware in order to keep up with the required performance level at a sustainable power consumption. Within this context, FPGA devices represent an interesting solution as they combine the benefits of power efficiency, performance and flexibility. Nevertheless, the steep learning curve and experience needed to develop efficient FPGA-based systems represents one of the main limiting factor for a broad utilization of such devices.
In this talk, I will first present CAOS, a framework which helps the application designer in identifying acceleration opportunities and guides through the implementation of the final FPGA-based system. The CAOS platform targets the full stack of the application optimization process, starting from the identification of the kernel functions to accelerate, to the optimization of such kernels and to the generation of the runtime management and the configuration files needed to program the FPGA. After CAOS, I will present the HUGenomics projects, based on the CAOS framework. The unique genetic profile of a species is leading to the development of customized treatments, from personalized medicine to agrigenomics, but the exponential growth of available genomic data requires a computational effort that may limit the progress of these fields. The HUGenomics framework aims at facilitating genome assembly process by means of both hardware accelerated algorithms and scientific data visualization tools. Indeed, the system raises the level of abstraction allowing users to easily integrate custom algorithms into the hardware pipeline without any knowledge of the underneath architecture.
Lessons learned building a big data analytics engine, from proprietary to ope...J On The Beach
Lessons learned building a big data analytics engine, from proprietary to open source by Álvaro Santamaria & Joel Brunger
After spending four years building a proprietary all-in-one streaming analytics engine for financial services, it became clear that open-source was starting to pull ahead. Alvaro will talk about the challenges of creating an IT operations solution for financial services; what to build, what not to build, and how to use open source tools to get past the infrastructure and focus on the business problems that matter.
Similar to Integrating Real-Time Video Data Streams with Spark and Kafka (20)
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
Mike Limcaco, Analytics Specialist / Customer Engineer at Google
Measure trends in a particular topic or search term across Google Search across the US down to the city-level. Integrate these data signals into analytic pipelines to drive product, retail, media (video, audio, digital content) recommendations tailored to your audience segment. We'll discuss how Google unique datasets can be used with Google Cloud smart analytic services to process, enrich and surface the most relevant product or content that matches the ever-changing interests of your local customer segment.
Melinda Thielbar, Data Science Practice Lead and Director of Data Science at Fidelity Investments
From corporations to governments to private individuals, most of the AI community has recognized the growing need to incorporate ethics into the development and maintenance of AI models. Much of the current discussion, though, is meant for leaders and managers. This talk is directed to data scientists, data engineers, ML Ops specialists, and anyone else who is responsible for the hands-on, day-to-day of work building, productionalizing, and maintaining AI models. We'll give a short overview of the business case for why technical AI expertise is critical to developing an AI Ethics strategy. Then we'll discuss the technical problems that cause AI models to behave unethically, how to detect problems at all phases of model development, and the tools and techniques that are available to support technical teams in Ethical AI development.
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
Antje Barth, Principal Developer Advocate, AI/ML at AWS & Chris Fregly, Principal Engineer, AI & ML at AWS
The frequency and severity of natural disasters are increasing. In response, governments, businesses, nonprofits, and international organizations are placing more emphasis on disaster preparedness and response. Many organizations are accelerating their efforts to make their data publicly available for others to use. Repositories such as the Registry of Open Data on AWS and Humanitarian Data Exchange contain troves of data available for use by developers, data scientists, and machine learning practitioners. In this session, see how a community of developers came together though the AWS Disaster Response hackathon to build models to support natural disaster preparedness and response.
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
Sig Narvaez, Executive Solution Architect at MongoDB
MongoDB is now a Developer Data Platform. Come learn what�s new in the 6.0 release and Atlas following all the recent announcements made at MongoDB World 2022. Topics will include
- Atlas Search which combines 3 systems into one (database, search engine, and sync mechanisms) letting you focus on your product's differentiation.
- Atlas Data Federation to seamlessly query, transform, and aggregate data from one or more MongoDB Atlas databases, Atlas Data Lake and AWS S3 buckets
- Queryable Encryption lets you run expressive queries on fully randomized encrypted data to meet the most stringent security requirements
- Relational Migrator which analyzes your existing relational schemas and helps you design a new MongoDB schema.
- And more!
Data Con LA 2022 - Real world consumer segmentationData Con LA
Jaysen Gillespie, Head of Analytics and Data Science at RTB House
1. Shopkick has over 30M downloads, but the userbase is very heterogeneous. Anecdotal evidence indicated a wide variety of users for whom the app holds long-term appeal.
2. Marketing and other teams challenged Analytics to get beyond basic summary statistics and develop a holistic segmentation of the userbase.
3. Shopkick's data science team used SQL and python to gather data, clean data, and then perform a data-driven segmentation using a k-means algorithm.
4. Interpreting the results is more work -- and more fun -- than running the algo itself. We'll discuss how we transform from ""segment 1"", ""segment 2"", etc. to something that non-analytics users (Marketing, Operations, etc.) could actually benefit from.
5. So what? How did team across Shopkick change their approach given what Analytics had discovered.
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
Ravi Pillala, Chief Data Architect & Distinguished Engineer at Intuit
TurboTax is one of the well known consumer software brand which at its peak serves 385K+ concurrent users. In this session, We start with looking at how user behavioral data & tax domain events are captured in real time using the event bus and analyzed to drive real time personalization with various TurboTax data pipelines. We will also look at solutions performing analytics which make use of these events, with the help of Kafka, Apache Flink, Apache Beam, Spark, Amazon S3, Amazon EMR, Redshift, Athena and Amazon lambda functions. Finally, we look at how SageMaker is used to create the TurboTax model to predict if a customer is at risk or needs help.
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
George Mansoor, Chief Information Systems Officer at California State University
Overview of the CSU Data Architecture on moving on-prem ERP data to the AWS Cloud at scale using Delphix for Data Replication/Virtualization and AWS Data Migration Service (DMS) for data extracts
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
Anand Ranganathan, Chief AI Officer at Unscrambl
Conversational AI is getting more and more widely used for customer support and employee support use-cases. In this session, I'm going to talk about how it can be extended for data analysis and data science use-cases ... i.e., how users can interact with a bot to ask analytical questions on data in relational databases.
This allows users to explore complex datasets using a combination of text and voice questions, in natural language, and then get back results in a combination of natural language and visualizations. Furthermore, it allows collaborative exploration of data by a group of users in a channel in platforms like Microsoft Teams, Slack or Google Chat.
For example, a group of users in a channel can ask questions to a bot in plain English like ""How many cases of Covid were there in the last 2 months by state and gender"" or ""Why did the number of deaths from Covid increase in May 2022"", and jointly look at the results that come back. This facilitates data awareness, data-driven collaboration and joint decision making among teams in enterprises and outside.
In this talk, I'll describe how we can bring together various features including natural-language understanding, NL-to-SQL translation, dialog management, data story-telling, semantic modeling of data and augmented analytics to facilitate collaborate exploration of data using conversational AI.
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
Anil Inamdar, VP & Head of Data Solutions at Instaclustr
The most modernized enterprises utilize polyglot architecture, applying the best-suited database technologies to each of their organization's particular use cases. To successfully implement such an architecture, though, you need a thorough knowledge of the expansive NoSQL data technologies now available.
Attendees of this Data Con LA presentation will come away with:
-- A solid understanding of the decision-making process that should go into vetting NoSQL technologies and how to plan out their data modernization initiatives and migrations.
-- They will learn the types of functionality that best match the strengths of NoSQL key-value stores, graph databases, columnar databases, document-type databases, time-series databases, and more.
-- Attendees will also understand how to navigate database technology licensing concerns, and to recognize the types of vendors they'll encounter across the NoSQL ecosystem. This includes sniffing out open-core vendors that may advertise as “open source,"" but are driven by a business model that hinges on achieving proprietary lock-in.
-- Attendees will also learn to determine if vendors offer open-code solutions that apply restrictive licensing, or if they support true open source technologies like Hadoop, Cassandra, Kafka, OpenSearch, Redis, Spark, and many more that offer total portability and true freedom of use.
Data Con LA 2022 - Intro to Data ScienceData Con LA
Zia Khan, Computer Systems Analyst and Data Scientist at LearningFuze
Data Science tutorial is designed for people who are new to Data Science. This is a beginner level session so no prior coding or technical knowledge is required. Just bring your laptop with WiFi capability. The session starts with a review of what is data science, the amount of data we generate and how companies are using that data to get insight. We will pick a business use case, define the data science process, followed by hands-on lab using python and Jupyter notebook. During the hands-on portion we will work with pandas, numpy, matplotlib and sklearn modules and use a machine learning algorithm to approach the business use case.
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
Mariana Danilovic, Managing Director at Infiom, LLC
We will address:
(1) Community creation and engagement using tokens and NFTs
(2) Organization of DAO structures and ways to incentivize Web3 communities
(3) DeFi business models applied to Web3 ventures
(4) Why Metaverse matters for new entertainment and community engagement models.
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
Curtis ODell, Global Director Data Integrity at Tricentis
Join me to learn about a new end-to-end data testing approach designed for modern data pipelines that fills dangerous gaps left by traditional data management tools—one designed to handle structured and unstructured data from any source. You'll hear how you can use unique automation technology to reach up to 90 percent test coverage rates and deliver trustworthy analytical and operational data at scale. Several real world use cases from major banks/finance, insurance, health analytics, and Snowflake examples will be presented.
Key Learning Objective
1. Data journeys are complex and you have to ensure integrity of the data end to end across this journey from source to end reporting for compliance
2. Data Management tools do not test data, they profile and monitor at best, and leave serious gaps in your data testing coverage
3. Automation with integration to DevOps and DataOps' CI/CD processes are key to solving this.
4. How this approach has impact in your vertical
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
Arif Ansari, Professor at University of Southern California
Super Bowl Ad cost $7 million and each year a few Super Bowl ads go viral. The traditional A/B testing does not predict virality. Some highly shared ones reach over 60 million organic views, which can be more valuable than views on TV. Not only are these voluntary, but they are typically without distraction, and win viewer engagement in the form of likes, comments, or shares. A Super Bowl ad that wins 69 million views on YouTube (e.g., Alexa Mind Reader) costs less than 10 cents per quality view! However, the challenge is triggering virality. We developed a method to predict virality and engineer virality into Ads.
1. Prof. Gerard J. Tellis and co-authors recommended that advertisers use YouTube to tease, test, and tweak (TTT) their ads to maximize sharing and viewing. 2022 saw that maxim put into practice.
2. We developed viral Ads prediction using two scientific models:
a. Prof. Gerard Tellis et al.'s model for viral prediction
b. Deep Learning viral prediction using social media effect
3. The model was able to identify all the top 15 Viral Ads it performed better than the traditional agencies.
4. New proposed method is Tease, Test, Tweak, Target and Spots Ad.
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
Jai Bansal, Senior Manager, Data Science at Aetna
This talk describes an internal data product called Member Embeddings that facilitates modeling of member medical journeys with machine learning.
Medical claims are the key data source we use to understand health journeys at Aetna. Claims are the data artifacts that result from our members' interactions with the healthcare system. Claims contain data like the amount the provider billed, the place of service, and provider specialty. The primary medical information in a claim is represented in codes that indicate the diagnoses, procedures, or drugs for which a member was billed. These codes give us a semi-structured view into the medical reason for each claim and so contain rich information about members' health journeys. However, since the codes themselves are categorical and high-dimensional (10K cardinality), it's challenging to extract insight or predictive power directly from the raw codes on a claim.
To transform claim codes into a more useful format for machine learning, we turned to the concept of embeddings. Word embeddings are widely used in natural language processing to provide numeric vector representations of individual words.
We use a similar approach with our claims data. We treat each claim code as a word or token and use embedding algorithms to learn lower-dimensional vector representations that preserve the original high-dimensional semantic meaning.
This process converts the categorical features into dense numeric representations. In our case, we use sequences of anonymized member claim diagnosis, procedure, and drug codes as training data. We tested a variety of algorithms to learn embeddings for each type of claim code.
We found that the trained embeddings showed relationships between codes that were reasonable from the point of view of subject matter experts. In addition, using the embeddings to predict future healthcare-related events outperformed other basic features, making this tool an easy way to improve predictive model performance and save data scientist time.
Data Con LA 2022 - Data Streaming with KafkaData Con LA
Jie Chen, Manager Advisory, KPMG
Data is the new oil. However, many organizations have fragmented data in siloed line of businesses. In this topic, we will focus on identifying the legacy patterns and their limitations and introducing the new patterns packed by Kafka's core design ideas. The goal is to tirelessly pursue better solutions for organizations to overcome the bottleneck in data pipelines and modernize the digital assets for ready to scale their businesses. In summary, we will walk through three uses cases, recommend Dos and Donts, Take aways for Data Engineers, Data Scientist, Data architect in developing forefront data oriented skills.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
4. 4
So what?
• Cool usecase and all, but what's the value?
• Real-time streams from robotic manufacturing (Audi, Ford, BMW, Toyota)
• Real-time traffic analysis for Smart Cities / Theme Parks (Denver, Cincinnati,
London, Disney, Universal)
• Real-time mechanical data from devices (Aircraft - Air France, Windmills – GE)
• And before you discount this whole sports things
• UK tax office collects 1.3B pounds ~2B USD in taxes each year from EPL teams
• Greater than the GDP of the bottom 25% of all countries
• 95 billion dollars wagered annually on NFL and college football
• #1 on Forbes 2000 list by a lot…
10. 10
From Seen To Described
d d d
+
+
Gigs of Video data to KB/MB description data
Most applications that convert are proprietary
but seeing investment in space by the usual suspects
11. 11
Phone home?
d d d
+
+
Data tends to be JSON or XML
Onvif Standard for Security
Messaging vs Web services?
17. 17
• The camera array sends a feed of 25 frames per second
• Each frame captures the x,y,z coordinates of every player
• A live feed of sport data is actually pretty serious Big Data!
Challenges
19. 19
• It let's you publish and subscribe to
streams of records. In this respect it
is similar to a message queue or
enterprise messaging system.
• It let's you store streams of records in
a fault-tolerant way.
• It let's you process streams of records
as they occur.
Distributed Streaming Platform
Kafka Background
20. 20
• Fast and general engine for large-scale data processing
• Developed in response to processing limitations with MapReduce
• 10x faster than MapReduce on disk
• 100x faster than MapReduce in memory
• Has a stack of libraries including Spark Streaming & MLib (machine learning)
• Runs everywhere; on Hadoop or Standalone
Spark Background
22. 22
Next Step: From Analysis to Prediction
Team stats
Who is the most likely to score
next?
Which team is going to win?
Individual players stats
Which player need a rest/bench?
Which player are being traded
( bring in historical data)
23. 23
Free Trial: Talend Big Data Sandbox
• A ready-to-run Docker environment
• A step-by-step expert guide: the cookbook
• Real-world scenarios using Spark, Kafka,
MapReduce & NoSQL
• Iot Analytics
• Real-time Recommendation
• Clickstream Analysis
• Weblogs Analysis
• EDW Offload
www.talend.com/BigDataSandbox
Hit the Easy Button for Hadoop, Spark and Machine Learning
24. 24
• An active community
• 80,000 visitors/week
• 3M of total downloads
• Engaged members
• Individual members &
partners
• Active User Groups
• 1,000+components built by
the community
The NEW Talend Community
25. 25
Talend Data Masters Awards
• Share your Talend story &
win in $1,500 for your
favorite charity
• Deadline: July 28th
• https://info.talend.com/d
atamasters2017all.html
Editor's Notes
More often that not, most data people anayze today is voliate – it comes and goes, in analyzed and gone.
The idea was that you needed to download twitter to do anything of value with social analytics but that’s not true… there’s an api for that.
The things Data anayltics is important to every originzation, doesn’t matter the size so “big” is different for everyone and that doesn’t
Velocity and variety of the data
Who here is a sports fan? Big fantasy league players here?
Big data is an interesting marketing
The 4.5 trillion frames per second is the FASTEST slow motion camera to date, it is used to capture the moments leading up to, during and after a chemical reaction… not something we’d need for a goal line review but it certainly exemplifies the big data challenge we are presenting.
If you were to manually watch this, It would take you ~ hundreds of thousands of years to process…hope you didn’t have plans
NFL Zebra – RFID’s in jerseys – Force impact, speed, concussion rates
NBA, you’d think they could keep the traveling down to a minimum
Goal Line technology
There is a lot of value in the data that is created behind this… influence even by a small fraction we’re talking about millions
Now we’re going to break this challenge up into two sections, the first will cover all aspects of the image collection and video processing, the second covers the analytics
The first question that needs to be asked when architecting a solution for processing video and image data is what do I need to solve the problem. A lot of architectural decisions will be made depending on this question.
Is the challenge to identify that what I am seeing is a car? do I need to know what color it is? Or what the model is? Or in the case of video, can I tell the difference between one car and another? Perhaps I am just getting a general flow of traffic on a highway, or am I trying to identify the market share of one of my competitors by identifying the ratio of my car brands vs theirs within a given area?
Almost all video and image processing pipelines look like this.
We’re capturing the raw video format and they compressing / encoding.
Next we process the video to extract relevant metadata and then pass that information further downstream to our analytical process. There are a lot of questions as to where and when to do certain steps and we’ll walk though them in the following slides.
* This makes a very strong argument for processing and handling it as locally as possible to work with that high bandwidth
*18.88 Mbps in most urban areas with it even higher for a premium
The FCC recently found that 39% of rural populations lack target levels of speed: 25 Mbps for downloads and 3 Mbps uploads
This impacts things like smart farming or smart aggriculter
Some HD video cameras output uncompressed video, whereas others compress the video using a lossy compression method such as MPEG or H.264 H265 is also picking up
HEVC was developed with the goal of providing twice the compression efficiency of the previous standard, H.264 / AVC
At an identical level of visual quality, HEVC enables video to be compressed to a file that is about half the size (or half the bit rate) of AVC,
When compressed to the same file size or bit rate as AVC, HEVC delivers significantly better visual quality.
NFL stadiums tend to have hundreds to the thousand servers within the stadium devoted to encoding and metadata processing.
The usual suspects,
Amazon,
Google,
Microsoft,
IBM …. Just to name a few
While a lot of the camera hardware vendors will provide this processing capability, I did a check and there are some 30 + available API’s out there to handle the video processing. This is likely the most complex and use case specific process and I have yet to find a one size fits all API.
This makes a very strong argument for processing and handling it as locally as possible to work with that high bandwidth
But as discussed as work continues in codec compression and infrastructure improves upload bandwidth we might get to the point where this discussion becomes mute.
In short, the better we get at lossless compression the more flexible we can be in this step.... Where’s pied piper when you need them
So with that in mind I’d like to show you how you could build a process like this. We’re going to take the google vision API for a little spin, I am going to gather you up and we’re going to take a picture that I’ll post on twitter and pull down using Talend to analyze with the Google Vision API. It will spit out some interest results and hopefully recognize you all as people and see your faces.
So we just covered how to architect something to handle video processing and discussed some of the trade offs for locality of service finishing off with a demo highlighting some of the work cloud based companies like google are doing to democratize the video and image meta data gathering process.
So now lets focus on the analytical side. Where we left off from the video processing architecture was that the video data had been converted into a metadata representation. We’re going to want to work with that in a more general analytical setting.
So going back to our conversation earlier about sports analytics and the gobs of money it brings it, we see coaches, analysts even the average sports viewer looking for insight into their favorite players; looking for ways to optimize their strategy to improve success..
In the case we have here which is focused on data collected from the EPL, players are often running all over the place and identifying when they are getting tired can be important intel for both teams. When you have players playing well into their 40s’ you want to make sure one of them isn’t going to break a hip or something….
The NFL is doing similar fact finding with regards to force impact analysis.. With so much attention on concussion rates and effects you bet everyone is making sure they keep their 120 Million franchise player safe and healthy.
Heres just an example of what is in the JSON information we receive, while it’s not the 4.5 trillion frames per second
Consistent Growth
1,500 members in the new Community.Talend.com INTERNAL ONLY
3M of total download of Talend software to-date since the company was founded (includes TOS + evals)
In 2016, we had 360,000 total downloads, up 14% since 2015 (total downloads include TOS + evals)
Engaged members:
Members: Our community members are “strategic partners” in solving data challenges—not just Talend challenges.
Talend Advocates: Small-to-medium SIs and VARs are the some of the greatest Talend champions in the community. They share their technical expertise and by sharing their knowledge, they get visibility and find new customers
Thought Leaders: We’re about to launch a new Discussion Board about IoT/Smart Cities. By comparison, competitors use their forum for product support only.
The health of a community is measure by the engagement—not just growth
User Groups:
Not only do we have community members that actively respond to questions on the forum ….
…. we also have customers who are creating and managing User Groups around the world (US, UK, Germany, France, Belgium, Switzerland, and India)
Our User Group in Portland, Maine, and Vancouver, Canada were launched by customers, and so were many others.
The Community Team is launching one NEW user group/quarter. In 2017, we plan to have a new user group in Chicago, Dallas, Toronto, and Atlanta in 2017. Vancouver was launched in Q1.
Every day, we have about 400 online concurrent users.
Monetization:
Both Talend and the Talend partners know how to monetize the community.
Talend has been converting open source customers (i.e. Judicial Court of California, Mogo Finance Technology) from Open Studion to the commercial version, Talend Data Integration
And partners who are active on the community are finding new business (some of the most active members are SI partners)
Criteria
Creativity and uniqueness of use
Scope and complexity of project
Business transformation and improvement
Timeline
We are accepting entries until July 28, 2017. Hurry and send your entries now!
Winners will be notified in September.
Winners will be announced in November.
Eligibility Requirements
Award winners should be willing to have their story shared publicly on Talend web site (company logo, video and case study) and promoted on social media and in press announcements.