The document discusses the challenges of software engineering for cloud video and Internet of Things platforms. It describes the author's background developing Internet camera and cloud computing technologies through their company Skywatch. It then outlines various computer vision and deep learning techniques the company has developed for applications like video analytics, smart video summarization and captioning. Finally, it discusses considerations for distributed systems and edge computing architectures.
Polong Lin(林伯龍)/how to approach data science problems from start to end台灣資料科學年會
Polong Lin is a Data Scientist at IBM. He is a regular speaker on data science and develops content for free data education on bigdatauniversity.com using open data tools on datascientistworkbench.com. Polong earned his M.Sc. at the Univ. of Tsukuba.
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
Despite the growing abundance of powerful tools, building and deploying machine-learning frameworks into production continues to be major challenge, in both science and industry. I'll present some particular pain points and cautions for practitioners as well as recent work addressing some of the nagging issues. I advocate for a systems view, which, when expanded beyond the algorithms and codes to the organizational ecosystem, places some interesting constraints on the teams tasked with development and stewardship of ML products.
About: Dr. Joshua Bloom is an astronomy professor at the University of California, Berkeley where he teaches high-energy astrophysics and Python for data scientists. He has published over 250 refereed articles largely on time-domain transients events and telescope/insight automation. His book on gamma-ray bursts, a technical introduction for physical scientists, was published recently by Princeton University Press. He is also co-founder and CTO of wise.io, a startup based in Berkeley. Josh has been awarded the Pierce Prize from the American Astronomical Society; he is also a former Sloan Fellow, Junior Fellow at the Harvard Society, and Hertz Foundation Fellow. He holds a PhD from Caltech and degrees from Harvard and Cambridge University.
Big Data and the Internet of Things (IoT) have the potential
to fundamentally shift the way we interact with our surroundings. The
challenge of deriving insights from the Internet of Things (IoT) has
been recognized as one of the most exciting and key opportunities for
both academia and industry. Advanced analysis of big data streams from
sensors and devices is bound to become a key area of data mining
research as the number of applications requiring such processing
increases. Dealing with the evolution over time of such data streams,
i.e., with concepts that drift or change completely, is one of the
core issues in stream mining. In this talk, I will present an
overview of data stream mining, and I will introduce
some popular open source tools for data stream mining.
How to win data science competitions with Deep LearningSri Ambati
Note: Please download the slides first, otherwise some links won't work!
How to win kaggle style data science competitions and influence decisions with R, Deep Learning and H2O's fast algorithms.
We take a few public and kaggle datasets and model to win competitions on accuracy and scoring speed.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Data Science with Spark - Training at SparkSummit (East)Krishna Sankar
Slideset of the training we gave at the Spark Summit East.
Blog : https://doubleclix.wordpress.com/2015/03/25/data-science-with-spark-on-the-databricks-cloud-training-at-sparksummit-east/
Video is posted at Youtube https://www.youtube.com/watch?v=oTOgaMZkBKQ
Polong Lin(林伯龍)/how to approach data science problems from start to end台灣資料科學年會
Polong Lin is a Data Scientist at IBM. He is a regular speaker on data science and develops content for free data education on bigdatauniversity.com using open data tools on datascientistworkbench.com. Polong earned his M.Sc. at the Univ. of Tsukuba.
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
Despite the growing abundance of powerful tools, building and deploying machine-learning frameworks into production continues to be major challenge, in both science and industry. I'll present some particular pain points and cautions for practitioners as well as recent work addressing some of the nagging issues. I advocate for a systems view, which, when expanded beyond the algorithms and codes to the organizational ecosystem, places some interesting constraints on the teams tasked with development and stewardship of ML products.
About: Dr. Joshua Bloom is an astronomy professor at the University of California, Berkeley where he teaches high-energy astrophysics and Python for data scientists. He has published over 250 refereed articles largely on time-domain transients events and telescope/insight automation. His book on gamma-ray bursts, a technical introduction for physical scientists, was published recently by Princeton University Press. He is also co-founder and CTO of wise.io, a startup based in Berkeley. Josh has been awarded the Pierce Prize from the American Astronomical Society; he is also a former Sloan Fellow, Junior Fellow at the Harvard Society, and Hertz Foundation Fellow. He holds a PhD from Caltech and degrees from Harvard and Cambridge University.
Big Data and the Internet of Things (IoT) have the potential
to fundamentally shift the way we interact with our surroundings. The
challenge of deriving insights from the Internet of Things (IoT) has
been recognized as one of the most exciting and key opportunities for
both academia and industry. Advanced analysis of big data streams from
sensors and devices is bound to become a key area of data mining
research as the number of applications requiring such processing
increases. Dealing with the evolution over time of such data streams,
i.e., with concepts that drift or change completely, is one of the
core issues in stream mining. In this talk, I will present an
overview of data stream mining, and I will introduce
some popular open source tools for data stream mining.
How to win data science competitions with Deep LearningSri Ambati
Note: Please download the slides first, otherwise some links won't work!
How to win kaggle style data science competitions and influence decisions with R, Deep Learning and H2O's fast algorithms.
We take a few public and kaggle datasets and model to win competitions on accuracy and scoring speed.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Data Science with Spark - Training at SparkSummit (East)Krishna Sankar
Slideset of the training we gave at the Spark Summit East.
Blog : https://doubleclix.wordpress.com/2015/03/25/data-science-with-spark-on-the-databricks-cloud-training-at-sparksummit-east/
Video is posted at Youtube https://www.youtube.com/watch?v=oTOgaMZkBKQ
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
http://www.meetup.com/SF-Bay-ACM/events/227480571/
(see also YouTube for a recording of the presentation)
The talk will cover a brief review of neural network basics and the following types of neural network deep learning:
* autocorrelational - unsupervised learning for extracting features. He will describe how additional layers build complexity in the feature extraction.
* convolutional - how to detect shift invariant patterns in various data sources. Horizontal shift invariant detection applies to signals like speech recognition or IoT data. Horizontal and vertical shift invariance applies to images or videos, for faces or self driving cars
* discuss details of applying deep net systems for continuous or real time scoring
* reinforcement learning or Q Learning - such as learning how to play Atari video games
* continuous space word models - such as word2vec, skipgram training, NLP understanding and translation
Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years.
A simplified way of approaching machine learning and deep learning from the ground up. The case for deep learning and an attempt to develop intuition for how/why it works. Advantages, state-of-the-art, and trends.
Presented at NYU Center for Genomics for NY Deep Learning Meetup
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)Amazon Web Services
For many companies, recommendation systems solve important machine learning problems. But as recommendation systems grow to millions of users and millions of items, they pose significant challenges when deployed at scale. The user-item matrix can have trillions of entries (or more), most of which are zero. To make common ML techniques practical, sparse data requires special techniques. Learn how to use MXNet to build neural network models for recommendation systems that can scale efficiently to large sparse datasets.
Note: Make sure to download the slides to get the high-resolution version!
Also, you can find the webinar recording here (please also download for better quality): https://www.dropbox.com/s/72qi6wjzi61gs3q/H2ODeepLearningArnoCandel052114.mov
Come hear how Deep Learning in H2O is unlocking never before seen performance for prediction!
H2O is google-scale open source machine learning engine for R & Big Data. Enterprises can now use all of their data without sampling and build intelligent applications. This live webinar introduces Distributed Deep Learning concepts, implementation and results from recent developments. Real world classification & regression use cases from eBay text dataset, MNIST handwritten digits and Cancer datasets will present the power of this game changing technology.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Alex Tellez's slides on Deep Learning Applications, including using auto-encoders, finding better Bordeaux wine, and fighting crime in Chicago, from the 3/11/15 Meetup at H2O.ai HQ and the 3/12/15 Meetup at Mills College.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
Video recording (no audio?): http://new.livestream.com/accounts/7874891/events/3565981/videos/68114143 from 32:00 to 54:30
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Bio:
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Distributed Deep Learning with Hadoop and TensorFlowJan Wiegelmann
Training deep neural nets can take long time and heavy resources. By leveraging an existing distributed versions of TensorFlow and Hadoop can train neural nets quickly and efficiently.
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Deep learning goes beyond the traditional machine learning of big data and analytics. In this session, we will review the AWS offering, Amazon Machine Learning, and the AWS GPU-intensive family of servers that run native machine learning and deep-learning algorithms. We will also cover some basic deep-learning algorithms using open source software. Session sponsored by Day1 Solutions.
Scalable Data Science and Deep Learning with H2Oodsc
The era of Big Data has passed, and the era of sensory overload – that is, the proliferation of sensor data – is upon us. The challenge today is how to create the next generation of business and consumer applications that transform how we interact with sensors themselves. Applications need to learn from every user interaction and data point and predict what can happen next. The future depends on Machine Learning, as much as it depends on the data itself, to change the way we interact with these systems.
In this talk, we explain H2O’s scalable distributed in-memory math architecture and its design principles. The platform was built alongside (and on top of) both Hadoop and Spark clusters and includes interfaces for R, Python, Scala, Java, JavaScript and JSON, along with its interactive graphical Flow interface that make it easier for non-engineers to stitch together complete analytic workflows. We outline the implementation of distributed machine learning algorithms such as Elastic Net, Random Forest, Gradient Boosting and Deep Learning. We will present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases. By the end of this presentation, you will know how to create your own machine learning workflows on your data using R, Python (iPython Notebooks) or the Flow GUI.
Machine Learning for Smarter Apps - Jacksonville MeetupSri Ambati
Machine Learning for Smarter Apps with Tom Kraljevic
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O.ai basic components and model deployment pipeline presented. Benchmark for scalability, speed and accuracy of machine learning libraries for classification presented from https://github.com/szilard/benchm-ml.
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
http://www.meetup.com/SF-Bay-ACM/events/227480571/
(see also YouTube for a recording of the presentation)
The talk will cover a brief review of neural network basics and the following types of neural network deep learning:
* autocorrelational - unsupervised learning for extracting features. He will describe how additional layers build complexity in the feature extraction.
* convolutional - how to detect shift invariant patterns in various data sources. Horizontal shift invariant detection applies to signals like speech recognition or IoT data. Horizontal and vertical shift invariance applies to images or videos, for faces or self driving cars
* discuss details of applying deep net systems for continuous or real time scoring
* reinforcement learning or Q Learning - such as learning how to play Atari video games
* continuous space word models - such as word2vec, skipgram training, NLP understanding and translation
Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years.
A simplified way of approaching machine learning and deep learning from the ground up. The case for deep learning and an attempt to develop intuition for how/why it works. Advantages, state-of-the-art, and trends.
Presented at NYU Center for Genomics for NY Deep Learning Meetup
AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)Amazon Web Services
For many companies, recommendation systems solve important machine learning problems. But as recommendation systems grow to millions of users and millions of items, they pose significant challenges when deployed at scale. The user-item matrix can have trillions of entries (or more), most of which are zero. To make common ML techniques practical, sparse data requires special techniques. Learn how to use MXNet to build neural network models for recommendation systems that can scale efficiently to large sparse datasets.
Note: Make sure to download the slides to get the high-resolution version!
Also, you can find the webinar recording here (please also download for better quality): https://www.dropbox.com/s/72qi6wjzi61gs3q/H2ODeepLearningArnoCandel052114.mov
Come hear how Deep Learning in H2O is unlocking never before seen performance for prediction!
H2O is google-scale open source machine learning engine for R & Big Data. Enterprises can now use all of their data without sampling and build intelligent applications. This live webinar introduces Distributed Deep Learning concepts, implementation and results from recent developments. Real world classification & regression use cases from eBay text dataset, MNIST handwritten digits and Cancer datasets will present the power of this game changing technology.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Alex Tellez's slides on Deep Learning Applications, including using auto-encoders, finding better Bordeaux wine, and fighting crime in Chicago, from the 3/11/15 Meetup at H2O.ai HQ and the 3/12/15 Meetup at Mills College.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
Video recording (no audio?): http://new.livestream.com/accounts/7874891/events/3565981/videos/68114143 from 32:00 to 54:30
Deep Learning has been dominating recent machine learning competitions with better predictions. Unlike the neural networks of the past, modern Deep Learning methods have cracked the code for training stability and generalization. Deep Learning is not only the leader in image and speech recognition tasks, but is also emerging as the algorithm of choice for highest predictive performance in traditional business analytics. This talk introduces Deep Learning and implementation concepts in the open-source H2O in-memory prediction engine. Designed for the solution of business-critical problems on distributed compute clusters, it offers advanced features such as adaptive learning rate, dropout regularization, parameter tuning and a fully-featured R interface. World record performance on the classic MNIST dataset, best-in-class accuracy for a high-dimensional eBay text classification problem and other relevant datasets showcase the power of this game-changing technology. A whole new ecosystem of Intelligent Applications is emerging with Deep Learning at its core.
Bio:
Prior to joining 0xdata as Physicist & Hacker, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives. While at SLAC, he authored the first curvilinear finite-element simulation code for space-charge dominated relativistic free electrons and scaled it to thousands of compute nodes. He also led a collaboration with CERN to model the electromagnetic performance of CLIC, a ginormous e+e- collider and potential successor of LHC. Arno has authored dozens of scientific papers and was a sought-after academic conference speaker. He holds a PhD and Masters summa cum laude in Physics from ETH Zurich. Arno was named 2014 Big Data All-Star by Fortune Magazine.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Distributed Deep Learning with Hadoop and TensorFlowJan Wiegelmann
Training deep neural nets can take long time and heavy resources. By leveraging an existing distributed versions of TensorFlow and Hadoop can train neural nets quickly and efficiently.
Suggestions:
1) For best quality, download the PDF before viewing.
2) Open at least two windows: One for the Youtube video, one for the screencast (link below), and optionally one for the slides themselves.
3) The Youtube video is shown on the first page of the slide deck, for slides, just skip to page 2.
Screencast: http://youtu.be/VoL7JKJmr2I
Video recording: http://youtu.be/CJRvb8zxRdE (Thanks to Al Friedrich!)
In this talk, we take Deep Learning to task with real world data puzzles to solve.
Data:
- Higgs binary classification dataset (10M rows, 29 cols)
- MNIST 10-class dataset
- Weather categorical dataset
- eBay text classification dataset (8500 cols, 500k rows, 467 classes)
- ECG heartbeat anomaly detection
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Deep learning goes beyond the traditional machine learning of big data and analytics. In this session, we will review the AWS offering, Amazon Machine Learning, and the AWS GPU-intensive family of servers that run native machine learning and deep-learning algorithms. We will also cover some basic deep-learning algorithms using open source software. Session sponsored by Day1 Solutions.
Scalable Data Science and Deep Learning with H2Oodsc
The era of Big Data has passed, and the era of sensory overload – that is, the proliferation of sensor data – is upon us. The challenge today is how to create the next generation of business and consumer applications that transform how we interact with sensors themselves. Applications need to learn from every user interaction and data point and predict what can happen next. The future depends on Machine Learning, as much as it depends on the data itself, to change the way we interact with these systems.
In this talk, we explain H2O’s scalable distributed in-memory math architecture and its design principles. The platform was built alongside (and on top of) both Hadoop and Spark clusters and includes interfaces for R, Python, Scala, Java, JavaScript and JSON, along with its interactive graphical Flow interface that make it easier for non-engineers to stitch together complete analytic workflows. We outline the implementation of distributed machine learning algorithms such as Elastic Net, Random Forest, Gradient Boosting and Deep Learning. We will present a broad range of use cases and live demos that include world-record deep learning models, anomaly detection tools and approaches for Kaggle data science competitions. We also demonstrate the applicability of H2O in enterprise environments for real-world customer production use cases. By the end of this presentation, you will know how to create your own machine learning workflows on your data using R, Python (iPython Notebooks) or the Flow GUI.
Machine Learning for Smarter Apps - Jacksonville MeetupSri Ambati
Machine Learning for Smarter Apps with Tom Kraljevic
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
H2O.ai basic components and model deployment pipeline presented. Benchmark for scalability, speed and accuracy of machine learning libraries for classification presented from https://github.com/szilard/benchm-ml.
Shou-de Lin is currently a full professor in the CSIE department of National Taiwan University. He holds a BS in EE department from National Taiwan University, an MS-EE from the University of Michigan, and an MS in Computational Linguistics and PhD in Computer Science both from the University of Southern California. He leads the Machine Discovery and Social Network Mining Lab in NTU. Before joining NTU, he was a post-doctoral research fellow at the Los Alamos National Lab. Prof. Lin's research includes the areas of machine learning and data mining, social network analysis, and natural language processing. His international recognition includes the best paper award in IEEE Web Intelligent conference 2003, Google Research Award in 2007, Microsoft research award in 2008, merit paper award in TAAI 2010, best paper award in ASONAM 2011, US Aerospace AFOSR/AOARD research award winner for 5 years. He is the all-time winners in ACM KDD Cup, leading or co-leading the NTU team to win 5 championships. He also leads a team to win WSDM Cup 2016 Champion. He has served as the senior PC for SIGKDD and area chair for ACL. He is currently the associate editor for International Journal on Social Network Mining, Journal of Information Science and Engineering, and International Journal of Computational Linguistics and Chinese Language Processing. He receives the Young Scholars' Creativity Award from Foundation for the Advancement of Outstanding Scholarship and Ta-You Wu Memorial Award.
鄭世昐/未來城市的任意門 (Mobility on Demand for Future Cities)台灣資料科學年會
Shih-Fen Cheng is Associate Professor of Information Systems and Deputy Director of the Fujitsu-SMU Urban Computing and Engineering Corp Lab at the Singapore Management University. He received his Ph.D. degree in industrial and operations engineering from the University of Michigan, Ann Arbor, and B.S.E. degree in mechanical engineering from the National Taiwan University.
His research focuses on the modeling and optimization of complex systems in engineering and business domains. He is particularly interested in the application areas of transportation, computational markets, and human decision-making. He is a member of INFORMS, AAAI, and IEEE, and serves as Area Editor for Electronic Commerce Research and Applications.
謝宗震 ,DSP 智庫驅動資料科學家,清華統計博士,對於統計方法與工具的推廣具有極大的熱忱,期望利用統計思維及分析工具幫助各個領域解決問題。輔導超過 300 位政府、企業、非營利組織人士成為資料分析人才。 Data for Social Good (D4SG) 計畫共同發起人,打造一個「資料力,做公益」的交流與媒合平台。
江振宇/It's Not What You Say: It's How You Say It!台灣資料科學年會
Chen-Yu Chiang was born in Taipei, Taiwan, in 1980. He received the B.S., M.S., Ph.D. degrees in communication engineering from National Chiao Tung University (NCTU), Hsinchu, Taiwan, in 2002, 2004, and 2009, respectively. In 2009, he was a Postdoctoral Fellow at the Department of Electrical Engineering, NCTU, where he primarily worked on prosody modeling for automatic speech recognition and text-to-speech system, under the guidance of Prof. Sin-Horng Chen. In 2012, he was a Visiting Scholar at the Center for Signal and Image Processing (CSIP), Georgia Institute of Technology, Atlanta. Currently he is the director of the Speech and Multimedia Signal Processing Lab and an assistant professor at the Department of Communication Engineering, National Taipei University. His main research interests are in speech processing, in particular prosody modeling, automatic speech recognition and text-to-speech systems.
Jane Hsu is a professor and department chair of Computer Science and Information Engineering at National Taiwan University. Her research interests include multi-agent systems, intelligent data analysis, commonsense knowledge, and context-aware computing. Prof. Hsu is the director of the Intel-NTU Connected Context Computing Center, featuring global research collaboration among NTU, Intel, and the National Science Council of Taiwan. She serves on the editorial board of Journal of Information Science and Engineering (2010-), International Journal of Service Oriented Computing and Applications (Springer, 2007-2009) and Intelligent Data Analysis (Elsevier/IOS Press, 1997-2002). She is actively involved in many key international AI conferences as organizers and members of the program committee. In addition to serving as the President of Taiwanese Association for Artificial Intelligence (2013-2014), Prof. Hsu has been a member of AAAI, IEEE, ACM, Phi Tau Phi, and an executive committee member of the IEEE Technical Committee on E-Commerce (2000) and TAAI (2004-current).
Yi-Hsuan Yang is an Associate Research Fellow with Academia Sinica. He received his Ph.D. degree in Communication Engineering from National Taiwan University in 2010, and became an Assistant Research Fellow in Academia Sinica in 2011. He is also an Adjunct Associate Professor with the National Tsing Hua University, Taiwan. His research interests include music information retrieval, machine learning and affective computing. Dr. Yang was a recipient of the 2011 IEEE Signal Processing Society (SPS) Young Author Best Paper Award, the 2012 ACM Multimedia Grand Challenge First Prize, and the 2014 Ta-You Wu Memorial Research Award of the Ministry of Science and Technology, Taiwan. He is an author of the book Music Emotion Recognition (CRC Press 2011) and a tutorial speaker on music affect recognition in the International Society for Music Information Retrieval Conference (ISMIR 2012). In 2014, he served as a Technical Program Co-chair of ISMIR, and a Guest Editor of the IEEE Transactions on Affective Computing and the ACM Transactions on Intelligent Systems and Technology.
國立臺灣大學電機所博士生,平時致力於推廣 R 語言,曾主辦多場 R 語言推廣講座,並經常於 Taiwan R User Group 分享 R 的使用心得。有豐富的 R 語言實務經驗,包含資料的收集、整理、分析到報告製作。擅長根據專案需求,量身打造 R 的資料分析系統,以及運用 R 和 C++ 撰寫高效能演算法。
44CON 2014 - I Hunt TR-069 Admins: Pwning ISPs Like a Boss, Shahar Tal44CON
44CON 2014 - I Hunt TR-069 Admins: Pwning ISPs Like a Boss, Shahar Tal
Residential gateway (/SOHO router) exploitation is a rising trend in the security landscape - ever so often do we hear of yet another vulnerable device, with the occasional campaign targeted against specific versions of devices through independent scanning or Shodan dorking. We shine a bright light on TR-069/CWMP, the previously under-researched, de-facto CPE device management protocol, and specifically target ACS (Auto Configuration Server) software, whose pwnage can have devastating effects on critical amounts of users. These servers are, by design, in complete control of entire fleets of consumer premises devices, intended for use by ISPs and Telco providers. or nation-state adversaries, of course (sorry NSA, we know it was a cool attack vector with the best research-hours-to-mass-pwnage ratio). We investigate several TR-069 ACS platforms, and demonstrate multiple instances of poorly secured deployments, where we could have gained control over hundreds of thousands of devices. During the talk (pending patch availability), we will release exploits to vulnerabilities we discovered in ACS software, including RCEs on several platforms.
Abusing bleeding edge web standards for appsec gloryPriyanka Aash
"Through cooperation between browser vendors and standards bodies in the recent past, numerous standards have been created to enforce stronger client-side control for web applications. As web appsec practitioners continue to shift from mitigating vulnerabilities to implementing proactive controls, each new standard adds another layer of defense for attack patterns previously accepted as risks. With the most basic controls complete, attention is shifting toward mitigating more complex threats. As a result of the drive to control for these threats client-side, standards such as SubResource Integrity (SRI), Content Security Policy (CSP), and HTTP Public Key Pinning (HPKP) carry larger implementation risks than others such as HTTP Strict Transport Security (HSTS). Builders supporting legacy applications actively make trade-offs between implementing the latest standards versus accepting risks simply because of the increased risks newer web standards pose.
In this talk, we'll strictly explore the risks posed by SRI, CSP, and HPKP; demonstrate effective mitigation strategies and compromises which may make these standards more accessible to builders and defenders supporting legacy applications; as well as examine emergent properties of standards such as HPKP to cover previously unforeseen scenarios. As a bonus for the breakers, we'll explore and demonstrate exploitations of the emergent risks in these more volatile standards, to include multiple vulnerabilities uncovered quite literally during our research for this talk (which will hopefully be mitigated by d-day)."
(Source: Black Hat USA 2016, Las Vegas)
The Internet of Fails - Mark Stanislav, Senior Security Consultant, Rapid7Rapid7
The Internet of Fails - Where IoT (the Internet of Things) has gone wrong and how we’re making it right. By Mark Stanislav @mstanislav, Senior Security Consultant, Rapid7
In the design of electronics and semiconductors, challenges are compounded by the integration of AI, multi-core, real-time software, network, connectivity, diagnostics, and security. Performance limits, battery life, and cost are adoption barriers. It is extremely important to have tools and processes that deliver efficiency throughout the design cycle.
Continuous verification from planning to development addresses the multi-discipline needs of hardware, software, and networks. This unique approach accelerates the design phase, defines the test efforts, and finds defects during specification. Architecture modeling is required to meet timing deadlines, generate the lowest power consumption, and attain the highest Quality-of-Service. optimize the electronic design system and designing of custom components.
This is a fun one! Learn how to hack up robots you can buy at a local toy store. You’ll see the methods used to take the video stream out of the robot and turn it into a format Flash likes. You’ll get the lowdown on how to send API commands to control the bot. We’ll show you how to connect it to alternative controllers and use ActionScript for some simple color detection on the video stream.
This is a fun one! Learn how to hack up robots you can buy at a local toy store. You’ll see the methods used to take the video stream out of the robot and turn it into a format Flash likes. You’ll get the lowdown on how to send API commands to control the bot. We’ll show you how to connect it to alternative controllers and use ActionScript for some simple color detection on the video stream.
Heartbleed Bug Vulnerability: Discovery, Impact and SolutionCASCouncil
Join the CASC Wednesday April 30 for a Google+ hangout on the Heartbleed Bug. We’ll cover everything from what the bug does to how to tell if your site is at risk and how certificate authorities are responding.
Panel of CASC members:
• Robin Alden- Comodo
• Jeremy Rowley- DigiCert
• Bruce Morton- Entrust
• Rick Andrews- Symantec
• Wayne Thayer- Go Daddy
Watch the recording: http://bit.ly/1jAQCtk
A 60-slide survey of the Internet of things: market philosophy and theory. Philosophy: Horizontal IoT platforms are stupid. Build something people love. You earn the right for others to base their business upon yours with deeply entrenched vertical value. Making: a survey of a few elements to crafting connected products. Local connectivity, Intelligence, internet connectivity, and – if you insist – IoT platforms.
Top IT Management Practices for Government EntitiesSolarWinds
For more information, visit: http://www.solarwinds.com/federal_government/it-management-solutions-for-government.aspx
Watch this webcast: http://www.solarwinds.com/resources/webcasts/top-it-management-practices-for-government-entities.html
Combining the unique and evolving IT infrastructure requirements of government entities, IT engineers and administrators are challenged on a daily basis. Being tasked with with ensuring networks, servers and applications are running at peak performance while maintaining security and reducing operating costs is no easy feat. In this webinar, we will provide some top tips for satisfying these needs, such as:
• Monitoring network performance and traffic
• Ensuring compliance and security
• Monitoring and controlling virtualization
• Managing your servers and apps
• Monitoring, capacity planning and reclaiming storage
SolarWinds experts, Head Geek Josh Stephens and Sales Engineer Sean Martinez, will demo the SolarWinds portfolio of IT management products and show you just how easy it can be to manage your environment.
Modern Web Security, Lazy but Mindful Like a FoxC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2hYU0cd.
Albert Yu presents a few viable, usable and effective defensive techniques that developers have often overlooked. Filmed at qconsf.com.
Albert Yu is currently working as a principal engineer for the Trust Engineering team in Atlassian. He has spent 15 years exposing himself to many different aspects of a security program, including security engineering, R&D, product reviews, code review, penetration test, governance and compliance, risk management, incident response, in large scale environment.
Beyond websites using drupal for digital signsAcquia
Drupal 8 can power experiences beyond the traditional web. As more data rich APIs become available, Drupal can be used to accumulate data, identify a variety of devices in an Internet of Things network and then route data to the appropriate places.
Given Drupal’s own rich content management capabilities, the CMS can still be utilized to enhance this datastream - making it that much more relevant based on location, language or any other metadata stored in it. In this presentation we will demonstrate how to use Drupal 8 to power a real-time signage system and discuss the techniques to build your own!
What’s Covered:
Responsive Techniques to support different display sizes.
ADA rules around public signage. We’re not just talking WCAG/508 anymore!
How to rebroadcast data from other sources.
Data Delivery Methods: Push and Pull models.
Sizing and Scaling your network of Signs.
Fault tolerance on your Kiosk.
Why even use Drupal to power a sign?
Similar to 雲端影音與物聯網平台的軟體工程挑戰:以 Skywatch 為例-陳維超 (20)
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
50. SMART FAST FORWARD
•20x average speedup
•Compressed domain
processing
•30x faster processing
compared to competition
•3 seconds for every 10
minutes
52. SMART DAILY
•Intelligent video summary
•Daily email on your past day’s activity
- Compressed domain
analysis
- Very fast - totally I/O
bound
- Detect events based
on activity / sensible
motion
55. SMART CAPTION
•Accuracy not critical for application (?!)
•Deep learning classification
•Berkley Caffe
•ImageNet / Microsoft COCO
• Stanford NeuralTalk
•GPU / CUDA powered
•1 second per image
58. FEATURE MATRIX
TrainingCost
Runtime Cost
Face
Detect
Loitering
Trip wire
Area Alert
Left object
Motion
License
Plate
People
Detection
Face
Recognition
Object
Classification
Age
Gender
People
Count
Heat
Map
Smart
Caption
Smart
Fast
Forward
Smart
Daily
Skywatch
Regular
IVA
63. CLOUDVS LOCAL
Cloud Local
How?
Cheap camera
Cloud computer
Special camera
Local computer
Best for
Search classify
Big data
Real-time alerts
Advantages
Easy to deploy manage
Cheap camera
Rental model
Lower bandwidth
Real-time applications
Disadvantages
Higher bandwidth
Maybe not for alerts
Difficult to deploy manage
Expensive camera
Higher upfront cost
65. WE HAVE BUILT STUFFS
•Chief shader architect for Playstation 3 (RSX)
•75% of the GPU chip
•Expensive bugs ( $100K USD each)
•Targeted test, random test, monkey test
•From architecture to driver to hardware
•Shipped over 100M systems
66. WE HAVE INVENTED STUFFS
•OpenCV
•MPEG4 Source / OpenLF
•Fancy publications
•US Patents
67. HOW HARD CAN IT BE?
Get the plumbing done, then add smartness
69. SOFTWARE ENGINEERING
Aspects What do we do?
Development Scrum / continuous integration
Testing Unit / regression testing
Review
Design review / Design patterns
Code review
Operation
Push-button code deployment
Server monitors / camera diagnostic tools
What about
the cameras?
78. #1: BOB SAW ALICE
(AND MAYBE CHARLES)
Alice
Bob
Bob
Alice
Charles
Very early mistake — this won’t happen again
79. •How did this happen?
•Not thread-safe / process-safe
•Unique IDs are not unique (after a certain scale)
•Loose DB network synchronisation
•Old tunnels hijacked by new cameras
•Camera bug — HTTP authentication off in a firmware version
#1: BOB SAW ALICE
(AND MAYBE CHARLES)
80. Alice
•Why it won’t happen again?
#1: BOB SAW ALICE
(AND MAYBE CHARLES)
Alice
Bob
HTTP
Password
authenticated
Charles
Individually signed
CRC’ed files
Scanner to make
sure camera
authentication is on
Channel scanner to
clean up dangling
connections
RESTful API check
for signature
81. #2: FIRMWARE UPGRADE
EQUALS ANGRY CUSTOMERS
•Auto upgrade?
•Bad idea — people unplug and brick the camera
•User-triggered OTA upgrade
•Camera restore to default WiFi settings gone
•Continuous upgrade testing is needed
•Monitor customer upgrade events / failures
82. #3: JAVA - WRITE ONCE,
DEBUG EVERYWHERE
•Browser H.264 decoder / multi-channel player
•Originally JAVA-based. Why?
•Because a big company is using it
•Because Flash was killed by Jobs
•Because we can save money (P2P)
•Tons of support issue (!)
Browser Player
Flash / HTML5
player
83. #3: JAVA - WRITE ONCE,
DEBUG EVERYWHERE
•Browser H.264 decoder / multi-channel player
•Originally JAVA-based. Why?
•Because a big company is using it
•Because Flash was killed by Jobs
•Because we can save money (P2P)
Browser Player
Flash / HTML5
player
Believe in yourself, not the competitor
Choose a popular weapon
Worry about money saving when it
becomes a real issue