Integrating Real-Time Video Data Streams with Spark and Kafka

•

5 likes•5,368 views

Abstract: Join Nick Piette, Director of Evangelism at Talend, as he brings you a deep, technical discussion on the real-world data pipeline that underlies modern sports. Working from real-time instrumentation data collected during play, and using open source tools, Nick will show you how to produce meaningful analytics results in minutes. If you are using Kafka, Spark, or any real-time data science technologies, or even if you are just trying to get a better understanding of them, this event is for you. Speaker’s bio: Nick Piette is the Director of Evangelism at Talend. He has spent the last eight years helping enterprises with many different data processing challenges. Nick enjoys sharing the most compelling big data use cases that are changing the world.

Technology

1
Utilizing Spark Streaming
for analyzing a real-time
sport data feeds
Demonstration

2
4.5 Trillion
Frames per second
60 Frames
Visible to the human eye

3
Camera Tracking Systems
An array of cameras around the
field capture the players and ball
positions LIVE

4
So what?
• Cool usecase and all, but what's the value?
• Real-time streams from robotic manufacturing (Audi, Ford, BMW, Toyota)
• Real-time traffic analysis for Smart Cities / Theme Parks (Denver, Cincinnati,
London, Disney, Universal)
• Real-time mechanical data from devices (Aircraft - Air France, Windmills – GE)
• And before you discount this whole sports things
• UK tax office collects 1.3B pounds ~2B USD in taxes each year from EPL teams
• Greater than the GDP of the bottom 25% of all countries
• 95 billion dollars wagered annually on NFL and college football
• #1 on Forbes 2000 list by a lot…

7
What version do you need to solve the problem?

9
Raw vs Encoded
150mpbs at 4k per camera
d d d
+
+
Stadiums have on avg 20-30 cameras

10
From Seen To Described
d d d
+
+
Gigs of Video data to KB/MB description data
Most applications that convert are proprietary
but seeing investment in space by the usual suspects

11
Phone home?
d d d
+
+
Data tends to be JSON or XML
Onvif Standard for Security
Messaging vs Web services?

16
aggregate the
speed and
distance run of
each player IN
REAL TIME
Our goal:

17
• The camera array sends a feed of 25 frames per second
• Each frame captures the x,y,z coordinates of every player
• A live feed of sport data is actually pretty serious Big Data!
Challenges

18
Analytics Architecture
Database
Ingestion Process Store VisualizeDeliver
ALL designed in Talend – NO coding

19
• It let's you publish and subscribe to
streams of records. In this respect it
is similar to a message queue or
enterprise messaging system.
• It let's you store streams of records in
a fault-tolerant way.
• It let's you process streams of records
as they occur.
Distributed Streaming Platform
Kafka Background

20
• Fast and general engine for large-scale data processing
• Developed in response to processing limitations with MapReduce
• 10x faster than MapReduce on disk
• 100x faster than MapReduce in memory
• Has a stack of libraries including Spark Streaming & MLib (machine learning)
• Runs everywhere; on Hadoop or Standalone
Spark Background

22
Next Step: From Analysis to Prediction
Team stats
Who is the most likely to score
next?
Which team is going to win?
Individual players stats
Which player need a rest/bench?
Which player are being traded
( bring in historical data)

23
Free Trial: Talend Big Data Sandbox
• A ready-to-run Docker environment
• A step-by-step expert guide: the cookbook
• Real-world scenarios using Spark, Kafka,
MapReduce & NoSQL
• Iot Analytics
• Real-time Recommendation
• Clickstream Analysis
• Weblogs Analysis
• EDW Offload
www.talend.com/BigDataSandbox
Hit the Easy Button for Hadoop, Spark and Machine Learning

24
• An active community
• 80,000 visitors/week
• 3M of total downloads
• Engaged members
• Individual members &
partners
• Active User Groups
• 1,000+components built by
the community
The NEW Talend Community

25
Talend Data Masters Awards
• Share your Talend story &
win in $1,500 for your
favorite charity
• Deadline: July 28th
• https://info.talend.com/d
atamasters2017all.html

2021年11月18日のET & IoT 展で講演したセッション「米国修士課程ベストセラーに学ぶ体系的ソフトウェアエンジニアリングの必要性　～DX, AI, MaaS, …に惑わされない実践的エンジニアリングアプローチ～」の講演資料です。 https://f2ff.jp/introduction/5831?event_id=et-2021 ■アブストラクトコンピュータサイエンスのみでは不足していることに産業界が気づき、ソフトウェアエンジニアリングという分野が登場して50年が経った。産業界は実践事例を生み出しながらその体系構築に貢献し、恩恵をあずかるサイクルが生まれている。これは組込みソフトウェアにおいても例外ではない。現在、組込みを含む業界の進化は加速する一方であり、開発途中に当初見定めた前提条件すら変わってしまうという変化の時代にいる。この状況下で優れたエンジニアリングを継続していくためには、製品に組み込む要素技術のみを追求するのではなく、最新のエンジニアリング体系をベースラインとして自分たちの立ち位置を見直し、不足している部分を学習することが重要となる。講演者3名を含むSEPA翻訳プロジェクトでは、米国修士課程においてベストセラーとなっている体系的解説書「Software Engineering: A Practitioner's Approach 9th edition」を翻訳し、「実践ソフトウェアエンジニアリング第9版」としてオーム社より12月発刊予定である。https://www.ohmsha.co.jp/book/9784274227943/ これを機に、本セッションでは、ソフトウェアエンジニアリング体系とその実践、立ち位置を見直すために役立つ知見を紹介する。

技術文書を書く際の、心技体

Takahiro Kubo

Opta planner勉強会

Masahiko Umeno

Salesforce Certified Platform Developer Ⅰ 勉強会資料

Nishiyama Hiroaki

基本設計＋詳細設計の書き方社内勉強会0304

furuCRM株式会社 CEO/Dreamforce Vietnam Founder

PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会

Preferred Networks

第55回情報科学若手の会（2022年9月24日開催）でのPFNの講演資料を公開しました。 PFNエンジニアの薮内が、クラスタを使いやすく、効率的かつ公平に使い、信頼性高く運用するためのノウハウや、PFNのクラスタ開発・運用のおもしろさについてご紹介しました。 PFNでは計算基盤関連のポジション採用を行っています（2022年9月現在） https://apply.workable.com/preferred-networks/j/D85B7B005E/ https://apply.workable.com/preferred-networks/j/6CDF8CA1A8/ こんな環境にワクワクする方、ぜひご応募ください！ ○ 日進月歩で進化している機械学習にフォーカスした計算技術を低レイヤーから高レイヤーまでトータルに吸収できる ○ 大規模な機械学習クラスタの開発・運用が経験できる ○ Kubernetes を始めとする OSS コミュニティでも活躍できるチャンスがある ○ HPC と Cloud Native の境界領域というますます重要になる分野の経験ができる ○ 多様な要求・ユーザーリテラシをサポートするプラットフォーム設計を経験できる

プレゼンの技術

心谷本

プレゼンテーション用資料作成のプレゼンテーション資料

hiroshioda

Office 365 の SharePoint Online を使い始めたいけど、・何をまず考える必要があるのか？・どういうポイントを考慮しておけば良いのか？というこれから SharePoint Online を展開するための方に向けた ”読み物” となります。 (スライド数が多いのでダウンロードしてご利用ください。また、再利用可の資料となります。詳細は資料内をご確認ください。また、PowerPoint ファイルとして流用されたい場合は、お手数ですが下記のコミュニティにご参加頂きダウンロードをお願いいたします) また、上記の資料についての質問やご意見などは下記コミュニティ内で受け付けております（可能な限り） M365 コミュニティにご参加される場合は以下の URL からご参加ください。 https://www.yammer.com/japanoffice365users/#/home "Office 365 を使い始める/使い倒す” シリーズをまとめたクリップボードは以下です。 https://www.slideshare.net/microsoftjp/clipboards/office-365

データ分析基盤、どう作る？システム設計のポイント、教えます - Developers.IO 2019 (20191101)

Yosuke Katsuki

クラスメソッド社イベント"Developers.IO 2019 Tokyo"実施セッション https://dev.classmethod.jp/news/developers-io-2019-tokyo/ AWSが提供するサービスは多岐に渡ります。AWS上にデータ分析基盤を構築する場合、どのAWSサービスを組み合わせるか、沢山の選択肢があります。どのAWSサービスがどういう要件に適しているか、弊社で担当した多くの案件を元にお伝えします。

音声感情認識の分野動向と実用化に向けたNTTの取り組み

Atsushi_Ando

【プレゼン】見やすいプレゼン資料の作り方【初心者用】

MOCKS | Yuta Morishige

プレゼンテーションのスライド資料を作る上で押さえておきたい基本をまとめました．多分これがslideshare内で一番役に立つと思います．スライドの作り方を学んだことがない方、参考にどうぞ！ 2016.01.22 書籍発売　好評につき重版決定！！ http://book.impress.co.jp/books/1114101129 リニューアル増量版 http://www.slideshare.net/yutamorishige50/ss-41321443 2014.11.9アップロード！【連絡先等】 Yuta Morishige Webサイト: https://mocks.jp/ ※旧タイトル【プレゼン】研究室発表のプレゼン資料の作り方【初心者用】

Panorama ux 2019

早わかり匠Method

いまさら聞けない機械学習のキホン

自称・世界一わかりやすい音声認識入門

L2延伸を利用したクラウド移行とクラウド活用術

富士通クラウドテクノロジーズ株式会社

Goertek’s Experience with the Qualcomm Virtual Reality (VR) Accelerator Program

AugmentedWorldExpo

A talk from the Develop Track at AWE USA 2018 - the World's #1 XR Conference & Expo in Santa Clara, California May 30- June 1, 2018. Goertek’s Experience with the Qualcomm Virtual Reality (VR) Accelerator Program with Said Bakadir (Qualcomm) Allen Chien (Goertek) Qualcomm Technologies announced the VR HMD Accelerator Program to help device manufacturers quickly develop premium standalone VR HMDs. Goertek is Qualcomm’s primary original device manufacturer (ODM) partner and has developed multiple generations of VR HMD reference designs. This talk will share Goertek’s experiences with Qualcomm Technologies’ HMD Accelerator program and demonstrate how the program enables OEMs to improve their overall development experience and shorten time to commercialization. This program allows them to focus on their own customizations and content while leveraging Goertek’s engineering, design, and manufacturing experience in VR. Qualcomm Technologies announced the VR HMD Accelerator Program to help device manufacturers quickly develop premium standalone VR HMDs. Goertek is Qualcomm’s primary original device manufacturer (ODM) partner and has developed multiple generations of VR HMD reference designs. This talk will share Goertek’s experiences with Qualcomm Technologies’ HMD Accelerator program and demonstrate how the program enables OEMs to improve their overall development experience and shorten time to commercialization. This program allows them to focus on their own customizations and content while leveraging Goertek’s engineering, design, and manufacturing experience in VR. http://AugmentedWorldExpo.com

誰にでもできるプレゼン入門〜解脱プレゼンの極意〜

VirtualTech Japan Inc./Begi.net Inc.

画像キャプションの自動生成

Yoshitaka Ushiku

2016/10/12 第16回全脳アーキテクチャ勉強会＠リクルートテクノロジーズ 2016/08/01 第19回画像の認識・理解シンポジウム、チュートリアル＠浜松 2016/06/29 第3回ステアラボ人工知能セミナー＠千葉工業大学（スカイツリータウン） 2016/06/21 人工知能セミナー第7回「自然言語処理のＡＩの最新動向」＠産総研 2016/01/13 確率場と深層学習に関する第1回CRESTシンポジウム＠早稲田大学にて一部を使用。画像×言語の研究に関する日本語資料としては、現在一番網羅的だと信じています。

ソーシャルゲームにレコメンドエンジンを導入した話Tokoroten Nakayama

Walmart & IBM Revisit the Linear Road Benchmark- Roger Rea, IBM

Redis Labs

The Linear Road benchmark was devised in 2004 to compare Stream Data Management Systems. Walmart selected Linear Road to compare performance of streaming analytic offerings. IBM implemented the benchmark application using Redis to maintain state, and IBM Streams to handle the incoming events and queries. Walmart had to completely revamp the data drivers and test verification to take advantage of multicore multithreaded servers available today. Tests were run on Microsoft Azure cloud to ensure fair comparison of vendors. Redis and IBM Streams handled nearly 1 billion events in a 3 hour test on a single 16 core Azure node, and 3.8 billion when scaled out to 4 nodes. Come learn about the application and near linear scalability of Redis and IBM Streams.

Launching Your First Big Data Project on AWS

Amazon Web Services

Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line. Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together. Reasons to attend: Learn how AWS can help you process and make better use of your data with meaningful insights. Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions. Learn about real time data processing with Amazon Kinesis.

What's hot

欲しいアプリは自分で作る！経済産業省も認めたPower Appsの威力と可能性

Junichi Kodama

シスコ装置を使い倒す！組込み機能による可視化からセキュリティ強化

シスコシステムズ合同会社

remote Docker over SSHが熱い

Hiroyuki Ohnaka

PPAPを何とかしたいがPHSも何とかしたい

UEHARA, Tetsutaro

コロナ禍の現代におけるコミュニケーションの整理と人間が感じる不安、そして弊社での取り組みや意識について

宮坂望未

「これ危ない設定じゃないでしょうか」とヒアリングするための仕組み @AWS Summit Tokyo 2018

cyberagent

デザイン思考マスタークラス　2015年12月2-4日

（旧アカウント）一般社団法人デザイン思考研究所

SharePoint で始める情報共有とそのアプローチ

日本マイクロソフト株式会社

データ分析基盤、どう作る？システム設計のポイント、教えます - Developers.IO 2019 (20191101)

Yosuke Katsuki

音声感情認識の分野動向と実用化に向けたNTTの取り組み

Atsushi_Ando

【プレゼン】見やすいプレゼン資料の作り方【初心者用】

MOCKS | Yuta Morishige

Panorama ux 2019

早わかり匠Method

いまさら聞けない機械学習のキホン

自称・世界一わかりやすい音声認識入門

L2延伸を利用したクラウド移行とクラウド活用術

富士通クラウドテクノロジーズ株式会社

Goertek’s Experience with the Qualcomm Virtual Reality (VR) Accelerator Program

AugmentedWorldExpo

誰にでもできるプレゼン入門〜解脱プレゼンの極意〜

VirtualTech Japan Inc./Begi.net Inc.

画像キャプションの自動生成

Yoshitaka Ushiku

ソーシャルゲームにレコメンドエンジンを導入した話Tokoroten Nakayama

What's hot (20)

欲しいアプリは自分で作る！経済産業省も認めたPower Appsの威力と可能性

シスコ装置を使い倒す！組込み機能による可視化からセキュリティ強化

remote Docker over SSHが熱い

PPAPを何とかしたいがPHSも何とかしたい

コロナ禍の現代におけるコミュニケーションの整理と人間が感じる不安、そして弊社での取り組みや意識について

「これ危ない設定じゃないでしょうか」とヒアリングするための仕組み @AWS Summit Tokyo 2018

デザイン思考マスタークラス　2015年12月2-4日

SharePoint で始める情報共有とそのアプローチ

データ分析基盤、どう作る？システム設計のポイント、教えます - Developers.IO 2019 (20191101)

音声感情認識の分野動向と実用化に向けたNTTの取り組み

【プレゼン】見やすいプレゼン資料の作り方【初心者用】

Panorama ux 2019

早わかり匠Method

いまさら聞けない機械学習のキホン

自称・世界一わかりやすい音声認識入門

L2延伸を利用したクラウド移行とクラウド活用術

Goertek’s Experience with the Qualcomm Virtual Reality (VR) Accelerator Program

誰にでもできるプレゼン入門〜解脱プレゼンの極意〜

画像キャプションの自動生成

ソーシャルゲームにレコメンドエンジンを導入した話

Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...

Data Con LA

Anil Inamdar, VP & Head of Data Solutions at Instaclustr The most modernized enterprises utilize polyglot architecture, applying the best-suited database technologies to each of their organization's particular use cases. To successfully implement such an architecture, though, you need a thorough knowledge of the expansive NoSQL data technologies now available. Attendees of this Data Con LA presentation will come away with: -- A solid understanding of the decision-making process that should go into vetting NoSQL technologies and how to plan out their data modernization initiatives and migrations. -- They will learn the types of functionality that best match the strengths of NoSQL key-value stores, graph databases, columnar databases, document-type databases, time-series databases, and more. -- Attendees will also understand how to navigate database technology licensing concerns, and to recognize the types of vendors they'll encounter across the NoSQL ecosystem. This includes sniffing out open-core vendors that may advertise as â€œopen source,"" but are driven by a business model that hinges on achieving proprietary lock-in. -- Attendees will also learn to determine if vendors offer open-code solutions that apply restrictive licensing, or if they support true open source technologies like Hadoop, Cassandra, Kafka, OpenSearch, Redis, Spark, and many more that offer total portability and true freedom of use.

Data Con LA 2022 - Intro to Data Science

Data Con LA

Zia Khan, Computer Systems Analyst and Data Scientist at LearningFuze Data Science tutorial is designed for people who are new to Data Science. This is a beginner level session so no prior coding or technical knowledge is required. Just bring your laptop with WiFi capability. The session starts with a review of what is data science, the amount of data we generate and how companies are using that data to get insight. We will pick a business use case, define the data science process, followed by hands-on lab using python and Jupyter notebook. During the hands-on portion we will work with pandas, numpy, matplotlib and sklearn modules and use a machine learning algorithm to approach the business use case.

Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment

Data Con LA

Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...

Data Con LA

Curtis ODell, Global Director Data Integrity at Tricentis Join me to learn about a new end-to-end data testing approach designed for modern data pipelines that fills dangerous gaps left by traditional data management toolsâ€”one designed to handle structured and unstructured data from any source. You'll hear how you can use unique automation technology to reach up to 90 percent test coverage rates and deliver trustworthy analytical and operational data at scale. Several real world use cases from major banks/finance, insurance, health analytics, and Snowflake examples will be presented. Key Learning Objective 1. Data journeys are complex and you have to ensure integrity of the data end to end across this journey from source to end reporting for compliance 2. Data Management tools do not test data, they profile and monitor at best, and leave serious gaps in your data testing coverage 3. Automation with integration to DevOps and DataOps' CI/CD processes are key to solving this. 4. How this approach has impact in your vertical

Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...

Data Con LA

Arif Ansari, Professor at University of Southern California Super Bowl Ad cost $7 million and each year a few Super Bowl ads go viral. The traditional A/B testing does not predict virality. Some highly shared ones reach over 60 million organic views, which can be more valuable than views on TV. Not only are these voluntary, but they are typically without distraction, and win viewer engagement in the form of likes, comments, or shares. A Super Bowl ad that wins 69 million views on YouTube (e.g., Alexa Mind Reader) costs less than 10 cents per quality view! However, the challenge is triggering virality. We developed a method to predict virality and engineer virality into Ads. 1. Prof. Gerard J. Tellis and co-authors recommended that advertisers use YouTube to tease, test, and tweak (TTT) their ads to maximize sharing and viewing. 2022 saw that maxim put into practice. 2. We developed viral Ads prediction using two scientific models: a. Prof. Gerard Tellis et al.'s model for viral prediction b. Deep Learning viral prediction using social media effect 3. The model was able to identify all the top 15 Viral Ads it performed better than the traditional agencies. 4. New proposed method is Tease, Test, Tweak, Target and Spots Ad.

Data Con LA 2022- Embedding medical journeys with machine learning to improve...

Data Con LA

Jai Bansal, Senior Manager, Data Science at Aetna This talk describes an internal data product called Member Embeddings that facilitates modeling of member medical journeys with machine learning. Medical claims are the key data source we use to understand health journeys at Aetna. Claims are the data artifacts that result from our members' interactions with the healthcare system. Claims contain data like the amount the provider billed, the place of service, and provider specialty. The primary medical information in a claim is represented in codes that indicate the diagnoses, procedures, or drugs for which a member was billed. These codes give us a semi-structured view into the medical reason for each claim and so contain rich information about members' health journeys. However, since the codes themselves are categorical and high-dimensional (10K cardinality), it's challenging to extract insight or predictive power directly from the raw codes on a claim. To transform claim codes into a more useful format for machine learning, we turned to the concept of embeddings. Word embeddings are widely used in natural language processing to provide numeric vector representations of individual words. We use a similar approach with our claims data. We treat each claim code as a word or token and use embedding algorithms to learn lower-dimensional vector representations that preserve the original high-dimensional semantic meaning. This process converts the categorical features into dense numeric representations. In our case, we use sequences of anonymized member claim diagnosis, procedure, and drug codes as training data. We tested a variety of algorithms to learn embeddings for each type of claim code. We found that the trained embeddings showed relationships between codes that were reasonable from the point of view of subject matter experts. In addition, using the embeddings to predict future healthcare-related events outperformed other basic features, making this tool an easy way to improve predictive model performance and save data scientist time.

Data Con LA 2022 - Data Streaming with Kafka

Data Con LA

Jie Chen, Manager Advisory, KPMG Data is the new oil. However, many organizations have fragmented data in siloed line of businesses. In this topic, we will focus on identifying the legacy patterns and their limitations and introducing the new patterns packed by Kafka's core design ideas. The goal is to tirelessly pursue better solutions for organizations to overcome the bottleneck in data pipelines and modernize the digital assets for ready to scale their businesses. In summary, we will walk through three uses cases, recommend Dos and Donts, Take aways for Data Engineers, Data Scientist, Data architect in developing forefront data oriented skills.

More from Data Con LA (20)

Data Con LA 2022 Keynotes

Data Con LA 2022 Keynote

Data Con LA 2022 - Startup Showcase

Data Con LA 2022 Keynote

Data Con LA 2022 - Using Google trends data to build product recommendations

Data Con LA 2022 - AI Ethics

Data Con LA 2022 - Improving disaster response with machine learning

Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas

Data Con LA 2022 - Real world consumer segmentation

Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...

Data Con LA 2022 - Moving Data at Scale to AWS

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI

Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...

Data Con LA 2022 - Intro to Data Science

Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment

Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...

Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...

Data Con LA 2022- Embedding medical journeys with machine learning to improve...

Data Con LA 2022 - Data Streaming with Kafka

Recently uploaded

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

JMeter webinar - integration with InfluxDB and Grafana

RTTS

Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application. In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics. Length: 30 minutes Session Overview ------------------------------------------- During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana: - What out-of-the-box solutions are available for real-time monitoring JMeter tests? - What are the benefits of integrating InfluxDB and Grafana into the load testing stack? - Which features are provided by Grafana? - Demonstration of InfluxDB and Grafana using a practice web application To view the webinar recording, go to: https://www.rttsweb.com/jmeter-integration-webinar

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

PHP Frameworks: I want to break free (IPC Berlin 2024)

Ralf Eggert

In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development. This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Essentials of Automations: Optimizing FME Workflows with Parameters

Safe Software

Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place. Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects. Here’s what you’ll gain: - Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows. - Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy. - Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency. - Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity. We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic. Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.

Bits & Pixels using AI for Good.........

Alison B. Lowndes

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Abida Shariff

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

"Impact of front-end architecture on development cost", Viktor Turskyi

Fwdays

I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.

Search and Society: Reimagining Information Access for Radical Futures

Bhaskar Mitra

The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.

Recently uploaded (20)

Accelerate your Kubernetes clusters with Varnish Caching

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

JMeter webinar - integration with InfluxDB and Grafana

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

PHP Frameworks: I want to break free (IPC Berlin 2024)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Essentials of Automations: Optimizing FME Workflows with Parameters

Bits & Pixels using AI for Good.........

When stars align: studies in data quality, knowledge graphs, and machine lear...

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Neuro-symbolic is not enough, we need neuro-*semantic*

"Impact of front-end architecture on development cost", Viktor Turskyi

Search and Society: Reimagining Information Access for Radical Futures

Integrating Real-Time Video Data Streams with Spark and Kafka

1. 1 Utilizing Spark Streaming for analyzing a real-time sport data feeds Demonstration

2. 2 4.5 Trillion Frames per second 60 Frames Visible to the human eye

3. 3 Camera Tracking Systems An array of cameras around the field capture the players and ball positions LIVE

4. 4 So what? • Cool usecase and all, but what's the value? • Real-time streams from robotic manufacturing (Audi, Ford, BMW, Toyota) • Real-time traffic analysis for Smart Cities / Theme Parks (Denver, Cincinnati, London, Disney, Universal) • Real-time mechanical data from devices (Aircraft - Air France, Windmills – GE) • And before you discount this whole sports things • UK tax office collects 1.3B pounds ~2B USD in taxes each year from EPL teams • Greater than the GDP of the bottom 25% of all countries • 95 billion dollars wagered annually on NFL and college football • #1 on Forbes 2000 list by a lot…

5. 5

6. 6

7. 7 What version do you need to solve the problem?

8. 8 Flow d d d + +

9. 9 Raw vs Encoded 150mpbs at 4k per camera d d d + + Stadiums have on avg 20-30 cameras

10. 10 From Seen To Described d d d + + Gigs of Video data to KB/MB description data Most applications that convert are proprietary but seeing investment in space by the usual suspects

11. 11 Phone home? d d d + + Data tends to be JSON or XML Onvif Standard for Security Messaging vs Web services?

12. 12 Where does it reside? d d d + +

14. 14

15. 15

16. 16 aggregate the speed and distance run of each player IN REAL TIME Our goal:

17. 17 • The camera array sends a feed of 25 frames per second • Each frame captures the x,y,z coordinates of every player • A live feed of sport data is actually pretty serious Big Data! Challenges

18. 18 Analytics Architecture Database Ingestion Process Store VisualizeDeliver ALL designed in Talend – NO coding

19. 19 • It let's you publish and subscribe to streams of records. In this respect it is similar to a message queue or enterprise messaging system. • It let's you store streams of records in a fault-tolerant way. • It let's you process streams of records as they occur. Distributed Streaming Platform Kafka Background

20. 20 • Fast and general engine for large-scale data processing • Developed in response to processing limitations with MapReduce • 10x faster than MapReduce on disk • 100x faster than MapReduce in memory • Has a stack of libraries including Spark Streaming & MLib (machine learning) • Runs everywhere; on Hadoop or Standalone Spark Background

22. 22 Next Step: From Analysis to Prediction Team stats Who is the most likely to score next? Which team is going to win? Individual players stats Which player need a rest/bench? Which player are being traded ( bring in historical data)

23. 23 Free Trial: Talend Big Data Sandbox • A ready-to-run Docker environment • A step-by-step expert guide: the cookbook • Real-world scenarios using Spark, Kafka, MapReduce & NoSQL • Iot Analytics • Real-time Recommendation • Clickstream Analysis • Weblogs Analysis • EDW Offload www.talend.com/BigDataSandbox Hit the Easy Button for Hadoop, Spark and Machine Learning

24. 24 • An active community • 80,000 visitors/week • 3M of total downloads • Engaged members • Individual members & partners • Active User Groups • 1,000+components built by the community The NEW Talend Community

25. 25 Talend Data Masters Awards • Share your Talend story & win in $1,500 for your favorite charity • Deadline: July 28th • https://info.talend.com/d atamasters2017all.html

Editor's Notes

More often that not, most data people anayze today is voliate – it comes and goes, in analyzed and gone. The idea was that you needed to download twitter to do anything of value with social analytics but that’s not true… there’s an api for that. The things Data anayltics is important to every originzation, doesn’t matter the size so “big” is different for everyone and that doesn’t Velocity and variety of the data Who here is a sports fan? Big fantasy league players here? Big data is an interesting marketing
The 4.5 trillion frames per second is the FASTEST slow motion camera to date, it is used to capture the moments leading up to, during and after a chemical reaction… not something we’d need for a goal line review but it certainly exemplifies the big data challenge we are presenting. If you were to manually watch this, It would take you ~ hundreds of thousands of years to process…hope you didn’t have plans
NFL Zebra – RFID’s in jerseys – Force impact, speed, concussion rates NBA, you’d think they could keep the traveling down to a minimum Goal Line technology
There is a lot of value in the data that is created behind this… influence even by a small fraction we’re talking about millions
Now we’re going to break this challenge up into two sections, the first will cover all aspects of the image collection and video processing, the second covers the analytics
The first question that needs to be asked when architecting a solution for processing video and image data is what do I need to solve the problem. A lot of architectural decisions will be made depending on this question. Is the challenge to identify that what I am seeing is a car? do I need to know what color it is? Or what the model is? Or in the case of video, can I tell the difference between one car and another? Perhaps I am just getting a general flow of traffic on a highway, or am I trying to identify the market share of one of my competitors by identifying the ratio of my car brands vs theirs within a given area?
Almost all video and image processing pipelines look like this. We’re capturing the raw video format and they compressing / encoding. Next we process the video to extract relevant metadata and then pass that information further downstream to our analytical process. There are a lot of questions as to where and when to do certain steps and we’ll walk though them in the following slides.
* This makes a very strong argument for processing and handling it as locally as possible to work with that high bandwidth *18.88 Mbps in most urban areas with it even higher for a premium The FCC recently found that 39% of rural populations lack target levels of speed: 25 Mbps for downloads and 3 Mbps uploads This impacts things like smart farming or smart aggriculter Some HD video cameras output uncompressed video, whereas others compress the video using a lossy compression method such as MPEG or H.264 H265 is also picking up HEVC was developed with the goal of providing twice the compression efficiency of the previous standard, H.264 / AVC At an identical level of visual quality, HEVC enables video to be compressed to a file that is about half the size (or half the bit rate) of AVC, When compressed to the same file size or bit rate as AVC, HEVC delivers significantly better visual quality.
NFL stadiums tend to have hundreds to the thousand servers within the stadium devoted to encoding and metadata processing. The usual suspects, Amazon, Google, Microsoft, IBM …. Just to name a few While a lot of the camera hardware vendors will provide this processing capability, I did a check and there are some 30 + available API’s out there to handle the video processing. This is likely the most complex and use case specific process and I have yet to find a one size fits all API.
This makes a very strong argument for processing and handling it as locally as possible to work with that high bandwidth But as discussed as work continues in codec compression and infrastructure improves upload bandwidth we might get to the point where this discussion becomes mute. In short, the better we get at lossless compression the more flexible we can be in this step.... Where’s pied piper when you need them  So with that in mind I’d like to show you how you could build a process like this. We’re going to take the google vision API for a little spin, I am going to gather you up and we’re going to take a picture that I’ll post on twitter and pull down using Talend to analyze with the Google Vision API. It will spit out some interest results and hopefully recognize you all as people and see your faces.
So we just covered how to architect something to handle video processing and discussed some of the trade offs for locality of service finishing off with a demo highlighting some of the work cloud based companies like google are doing to democratize the video and image meta data gathering process.
So now lets focus on the analytical side. Where we left off from the video processing architecture was that the video data had been converted into a metadata representation. We’re going to want to work with that in a more general analytical setting.
So going back to our conversation earlier about sports analytics and the gobs of money it brings it, we see coaches, analysts even the average sports viewer looking for insight into their favorite players; looking for ways to optimize their strategy to improve success.. In the case we have here which is focused on data collected from the EPL, players are often running all over the place and identifying when they are getting tired can be important intel for both teams. When you have players playing well into their 40s’ you want to make sure one of them isn’t going to break a hip or something…. The NFL is doing similar fact finding with regards to force impact analysis.. With so much attention on concussion rates and effects you bet everyone is making sure they keep their 120 Million franchise player safe and healthy.
Heres just an example of what is in the JSON information we receive, while it’s not the 4.5 trillion frames per second
Consistent Growth 1,500 members in the new Community.Talend.com INTERNAL ONLY 3M of total download of Talend software to-date since the company was founded (includes TOS + evals) In 2016, we had 360,000 total downloads, up 14% since 2015 (total downloads include TOS + evals) Engaged members: Members: Our community members are “strategic partners” in solving data challenges—not just Talend challenges. Talend Advocates: Small-to-medium SIs and VARs are the some of the greatest Talend champions in the community. They share their technical expertise and by sharing their knowledge, they get visibility and find new customers Thought Leaders: We’re about to launch a new Discussion Board about IoT/Smart Cities. By comparison, competitors use their forum for product support only. The health of a community is measure by the engagement—not just growth User Groups: Not only do we have community members that actively respond to questions on the forum …. …. we also have customers who are creating and managing User Groups around the world (US, UK, Germany, France, Belgium, Switzerland, and India) Our User Group in Portland, Maine, and Vancouver, Canada were launched by customers, and so were many others. The Community Team is launching one NEW user group/quarter. In 2017, we plan to have a new user group in Chicago, Dallas, Toronto, and Atlanta in 2017. Vancouver was launched in Q1. Every day, we have about 400 online concurrent users. Monetization: Both Talend and the Talend partners know how to monetize the community. Talend has been converting open source customers (i.e. Judicial Court of California, Mogo Finance Technology) from Open Studion to the commercial version, Talend Data Integration And partners who are active on the community are finding new business (some of the most active members are SI partners)
Criteria Creativity and uniqueness of use Scope and complexity of project Business transformation and improvement Timeline We are accepting entries until July 28, 2017. Hurry and send your entries now! Winners will be notified in September. Winners will be announced in November. Eligibility Requirements Award winners should be willing to have their story shared publicly on Talend web site (company logo, video and case study) and promoted on social media and in press announcements.

Integrating Real-Time Video Data Streams with Spark and Kafka

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Integrating Real-Time Video Data Streams with Spark and Kafka

Similar to Integrating Real-Time Video Data Streams with Spark and Kafka (20)

More from Data Con LA

More from Data Con LA (20)

Recently uploaded

Recently uploaded (20)

Integrating Real-Time Video Data Streams with Spark and Kafka

Editor's Notes