2009.1.9 在清華大學 EMBA 吳泉源教授班上擔任 Guest Speaker,介紹 Web 2.0 以後的網路發展現況,以「次世代的數位公益」為焦點來探討發揮影響力的網路連結如何可能。
「網路的意義」部份引述梅田望夫、Clay Shirky 論點與 Dr. Michael Nelson 教授介紹美國總統大選歐巴馬的實例。
2009.1.9 在清華大學 EMBA 吳泉源教授班上擔任 Guest Speaker,介紹 Web 2.0 以後的網路發展現況,以「次世代的數位公益」為焦點來探討發揮影響力的網路連結如何可能。
「網路的意義」部份引述梅田望夫、Clay Shirky 論點與 Dr. Michael Nelson 教授介紹美國總統大選歐巴馬的實例。
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemLiwei Ren任力偉
The bad character shift rule of Boyer-Moore string search algorithm is studied in this paper for the purpose of extending it to more general string match problems. An abstract problem of string match is defined in general. An optimized string match algorithm based one the bad character heuristics is proposed to solve the abstract match problem efficiently.
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsLiwei Ren任力偉
Near-duplicate document detection is a well-known problem in the area of information retrieval. It is an important problem to be solved for many applications in IT industry. It has been studied with profound research literatures. This article provides a novel solution to this classic problem. We present the problem with abstract models along with additional concepts such as text models, document fingerprints and document similarity. With these concepts, the problem can be transformed into keyword like search problem with results ranked by document similarity. There are two major techniques. The first technique is to extract robust and unique fingerprints from a document. The second one is to calculate document similarity effectively. Algorithms for both fingerprint extraction and document similarity calculation are introduced as a complete solution.
Binary Similarity : Theory, Algorithms and Tool EvaluationLiwei Ren任力偉
Similarity digesting is a class of algorithms and technologies that generate hashes from files and preserve file similarity. They find applications in various areas across security industry: malware variant detection, spam filtering, computer forensic analysis, data loss prevention and etc.. There are a few schemes and tools available that include ssdeep, sdhash and TLSH. While being useful for detecting file similarity, they define similarity from different perspectives. In other words, they take different approaches to describe what file similarity is about. In order to compare those tools with better evaluation, we introduce a simple mathematical model to describe similarity that would cover all three schemes and beyond. This model enables us to establish a theoretic framework for analyzing essential differences of various similarity digesting algorithms & tools. As a result, a few tools are found to be complementary to each other so that we can use them in a hybrid approach in practice. Data experiment results are provided to support the theoretic analysis. In addition, we introduce a novel similarity digesting scheme that were designed based on the mathematical model.
IoT Security: Problems, Challenges and SolutionsLiwei Ren任力偉
As a novel computing platform in network, IoT will bring many security challenges to enterprise networks, and create new opportunities for security industry. This talk will provide a general overview of enterprise network security problems, especially the data security, caused by IoT. After that, a few existing security technologies are evaluated as necessary elements of a holistic network security that cover IoT devices. These technologies include : (a) IoT security monitoring and control; (b) FOTA for firmware vulnerability management; (c) NetFlow based big data security analysis. In the end, the practice of standard security protocols (such as OpenIoC and IODEF) will be strongly advocated for delivering effective IoT security solutions.
Differential compression (aka, delta encoding) is a special category for data de-duplication. It can find many applications in various domains such as data backup, software revision control systems, software incremental update, file synchronization over network, to name just a few. This talk will introduce a taxonomy of how to categorize delta encoding schemes in various applications. Pros & cons of each scheme will be investigated in depth.
Bytewise Approximate Match: Theory, Algorithms and ApplicationsLiwei Ren任力偉
Byte-wise approximate matching has become an important field in computer science that includes not only practical value but also theoretical significance. This talk will use six cases to define and describe the concept of approximate matching rigorously. They are identicalness, containment, cross-sharing, similarity, approximate containment and approximate cross-sharing. Based on the concept of approximate matching, one can propose a theoretic framework that consists of many problems of approximate matching, searching & clustering. Algorithmic solutions and challenges of the matching problems will be briefed as well as theoretic analysis. This framework also includes some elements of our previous works in both document fingerprinting problem and mathematical evaluation of similarity digest schemes { TLSH, ssdeep, sdhash }. In the end, we will discuss applications in various security disciplines.
Overview of Data Loss Prevention (DLP) TechnologyLiwei Ren任力偉
DLP is a technology that detects potential data breach incidents in timely manner and prevents them by monitoring data in-use (endpoints), in-motion (network traffic), and at-rest (data storage). It has been driven by regulatory compliances and intellectual property protection. This talk will introduce DLP models that describe the capabilities and scope that a DLP system should cover. A few system categories will be discussed accordingly with high-level system architecture. DLP is an interesting technology in that it provides advanced content inspection techniques. As such, a few content inspection techniques will be proposed and investigated in rigorous terms.
DLP Systems: Models, Architecture and AlgorithmsLiwei Ren任力偉
DLP is a data security technology that detects and prevents data breach incidents by monitoring data in-use, in-motion and at-rest. It has been widely applied for regulatory compliances, data privacy and intellectual property protection. This talk will introduce basic concepts and security models to describe DLP systems with high level architecture. DLP is an interesting discipline with content inspection techniques supported by sophisticated algorithms. Special investigation will be taken for a few algorithms: document fingerprinting, data record fingerprinting, scalable M-pattern string match and etc..
Mathematical Modeling for Practical ProblemsLiwei Ren任力偉
Mathematical modeling is an important step for developing many advanced technologies in various domains such as network security, data mining and etc… This lecture introduces a process that the speaker summarizes from his past practice of mathematical modeling and algorithmic solutions in IT industry, as an applied mathematician, algorithm specialist or software engineer , and even as an entrepreneur. A practical problem from DLP system will be used as an example for creating math models and providing algorithmic solutions.
Securing Your Data for Your Journey to the CloudLiwei Ren任力偉
n the era of cloud computing, data security is one of the concerns for adopting cloud applications. In this talk, we will investigate a few general data security issues caused by cloud platforms: (a) Data security & privacy for the residence in cloud when using cloud SaaS or cloud apps; (b) Data leaks to personal cloud apps directly from enterprise networks; (c) Data leaks to personal cloud apps indirectly via BYOD devices.
Multiple technologies do exist for solving these data security issues. They are CASB , Cloud Encryption Gateway, Cloud DLP, and even traditional DLP. Those products or services are ad-hoc in nature. In long term, general cloud security technologies such as FHE (fully homomorphic encryption) or MPC (multi-party computation) should be implemented when they become practical.