Data Mining


Published on

  • Be the first to comment

Data Mining

  1. 1. Data Mining By: Chad Gregg John Wilder John Mary Lugemwa Yao Yao Bu
  2. 2. Our Presentation: <ul><li>General Overview/Brief History </li></ul><ul><li>The Brushing Technique </li></ul><ul><li>The Technique of Using Neural Networks </li></ul><ul><li>Data Mining & Privacy </li></ul><ul><li>Current Applications & Future Possibilities </li></ul>
  3. 3. Overview & History <ul><li>Data Mining: </li></ul><ul><ul><li>The process of finding patterns or correlations among data </li></ul></ul><ul><li>Evolution: </li></ul><ul><ul><li>Classical Statistics </li></ul></ul><ul><ul><li>Artificial Intelligence </li></ul></ul><ul><ul><li>Machine Learning </li></ul></ul>
  4. 4. Goals of Data Mining <ul><li>Prediction </li></ul><ul><li>Identification </li></ul><ul><li>Classification </li></ul><ul><li>Optimization </li></ul>
  5. 5. The Brushing Technique <ul><li>G raphical e xploratory data analysis </li></ul><ul><li>-- visualize relations between variables </li></ul><ul><li>Could be 2D lines or 3D surfaces </li></ul><ul><li>A nimated brushing and A utomatic function refitting </li></ul>
  6. 6. Pictures are easier to read
  7. 7. Types of visualization
  8. 8. A utomat ed graphics
  9. 9. Applications of the Brushing
  10. 10. Neural Networks <ul><li>A technique with roots in Cognitive Science and Artificial Intelligence </li></ul><ul><li>Data Mining adopted this technique as its own </li></ul><ul><li>What exactly are they? </li></ul>
  11. 11. Learning <ul><li>Repeatedly show input with correct output </li></ul><ul><li>Similar to a humans brain </li></ul><ul><ul><li>Changes neuron weights </li></ul></ul><ul><ul><li>Change Firing threshold </li></ul></ul><ul><li>Over-learning Problem </li></ul>
  12. 15. Possible Improvements <ul><li>Make them autonomous </li></ul><ul><li>Use a more correct model of the brain </li></ul><ul><li>Questions? </li></ul>
  13. 16. Data Mining & Privacy <ul><li>What is Privacy? </li></ul><ul><li>Data being mined </li></ul><ul><li>Data Mining: Tracking Terrorist Activities </li></ul><ul><li>A heated debate; what are the issues? </li></ul><ul><li>Proposed solutions </li></ul><ul><li>Challenges </li></ul>
  14. 17. The kind of your data being mined <ul><li>Data involves information such as: </li></ul><ul><ul><ul><li>credit reports </li></ul></ul></ul><ul><ul><ul><li>credit card information and transactions, </li></ul></ul></ul><ul><ul><ul><li>student loan applications </li></ul></ul></ul><ul><ul><ul><li>bank account numbers, </li></ul></ul></ul><ul><ul><ul><li>tax payer identification numbers, and similar information. </li></ul></ul></ul><ul><ul><ul><li>Medical records </li></ul></ul></ul>
  15. 18. Data Mining: Tracking Terrorist Activities <ul><li>Federal Government’ efforts to hunt terrorists after 9/11: </li></ul><ul><ul><li>Data is collected from both the federal agencies and the private sector databases. </li></ul></ul><ul><ul><li>United States General Account Office (GAO) report: </li></ul></ul><ul><ul><ul><li>Out of the 199 data mining efforts, 122 involved personal information. </li></ul></ul></ul><ul><ul><ul><li>Private sector: out of the 54 data mining efforts, 36 involved personal information. </li></ul></ul></ul><ul><ul><ul><li>Federal agencies: of the 77 efforts identified, 46 relied on personal information. </li></ul></ul></ul>
  16. 19. Privacy Concerns <ul><li>Violation of privacy and sense of personal freedom protected by the Fourth Amendment. </li></ul><ul><li>Some information is too personal to be used for other purposes other than it was originally intended for. </li></ul><ul><li>Peoples’ private lives are put to unreasonable public scrutiny. </li></ul><ul><li>Eminent danger that some patterns may inaccurately match with a criminal profile which may lead to unreasonable charges and arrests. </li></ul><ul><li>Security concerns </li></ul>
  17. 20. Federal Govt. Data Mining Programs <ul><li>MATRIX </li></ul><ul><ul><li>The Multi-state Anti-terrorist Information Exchange System </li></ul></ul><ul><li>TIA </li></ul><ul><ul><li>Total Information Awareness </li></ul></ul>
  18. 21. What can be done <ul><li>What Canada is proposing: </li></ul><ul><li>Three optional waivers to customers when giving out their information: </li></ul><ul><ul><li>State that no data mining be allowed on customer’s data; </li></ul></ul><ul><ul><li>Data mining be allowed only for internal use; </li></ul></ul><ul><ul><li>Authorize internal or external use of the data. </li></ul></ul>
  19. 22. Challenges to the solution <ul><li>Even if customers are informed beforehand about the purpose of the data, the challenge is that data mining extracts hidden patterns and rules; it is not easy to speculate what relationships will emerge. </li></ul><ul><li>The Federal Government’s exploitation private information in the interest of national security will continually create mixed concerning privacy issues. </li></ul><ul><li>If privacy issues are not adequately addressed, users of data mining technologies will be exposed to a wide range of legal challenges as the general public becomes more alerted to the potential misuse of personal data generated from data mining. </li></ul>
  20. 23. Current Applications <ul><li>Public Sector </li></ul><ul><ul><li>Law Enforcement </li></ul></ul><ul><ul><li>Policy Analysis </li></ul></ul><ul><li>Private Sector </li></ul><ul><ul><li>Business Analysis Tool </li></ul></ul>
  21. 24. Future Possibilities <ul><li>Government Policies </li></ul><ul><li>Commercial Database Systems of the Future </li></ul><ul><li>Growing Research Opportunities </li></ul>
  22. 25. Learn More: <ul><li>Questions? </li></ul><ul><li>Additional Resources </li></ul><ul><ul><li>Chapter 27 of our Text </li></ul></ul>
  23. 26. Additional Resources: <ul><li>Overview/History/Current Applications/Future Possibilities: </li></ul><ul><li>United States. Congress. House. Committee on Government Reform. Subcommittee on </li></ul><ul><ul><li>Technology, Information Policy, Intergovernmental Relations, and the Census. Data Mining: Current Applications And Future Possibilities: hearing before the Subcommittee on Technology, Information Policy, Intergovernmental Relations and the Census of the Committee on Government Reform, House of Representatives, One Hindered Eighth Congress , 25, March 2003. </li></ul></ul><ul><li> < > Accessed on 3, December 2005. </li></ul><ul><li>This source if from a government hearing and contains an excellent overview on what data mining is and its history, data mining techniques, current applications, and thoughts as to what data mining will be like in the future. It is a bit of a lengthy document but it is easy to skip to different sections and understand what is being talked about. It gives an excellent understanding of data mining from a reliable source. I have found the discussion on the current applications of data mining especially useful. </li></ul>
  24. 27. <ul><li>The Brushing Technique: </li></ul><ul><li>Wong, Pak Chung, and R. Daniel Bergeron. &quot;Brushing Techniques for Exploring Volume Datasets.&quot; VIS '97: Proceedings of the 8th Conference on Visualization '97. Phoenix, Arizona, United States. < > </li></ul><ul><li>This article composed by the Department of Computer Science at the University of New Hampshire describes several brushing techniques. It describes qualitative brushing, planar brushing, and volume brushing. </li></ul>
  25. 28. <ul><li>The Technique of Using Neural Networks: </li></ul><ul><li>Hill, T. & Lewicki, P. STATISTICS Methods and Applications. (2006) StatSoft, Tulsa, OK </li></ul><ul><li>This is a book about statistics that includes a chapter on data mining, providing a general overview and a small introduction to several techniques used in data mining. It also includes chapters about several of the techniques to give a more in depth explanation. The chapter of focus for me is Neural Networks. It gives a basic overview of artificial neural networks and their relation to the human brain and nervous system. It gives a brief discourse on human nerve function, and firing thresholds. The author then discusses how neural networks learn, and the general amount of information that needs to be shown for the information to be learned well. The book then touches on how over-learning can cause problems, comparing it to using too high of a degree polynomial in linear regression. Also, they give examples of several of the uses, and some problems associated with the use of neural networks. </li></ul><ul><li>Roy, Asim. Artificial Neural Networks- A Science in Trouble. (2000) SIGKDD Explorations, volume 1, issue 2, 33. </li></ul><ul><li>This is an article discussing shortcomings of the current concept of neural networks. The author discusses inaccuracy of the current neural network model, and claims changes need to be made to this model, making it similar to the human brain. The author also discusses what he believes to be over-optimism in regards to what neural networks can do (under the current model). He also talks about the necessity of creating a neural network that is autonomous, instead of always needing humans to feed in the inputs and outputs in order to learn. </li></ul>
  26. 29. <ul><li>Data Mining & Privacy: </li></ul><ul><li>United States General Accounting Office Data Mining: Federal Efforts Cover a Wide </li></ul><ul><ul><li>Range of Uses . “Report to the Ranking Minority Member, Subcommittee of Financial Management, the Budget, and International Security, Committee on Governmental Affairs, U.S. Senate” 24, May 2004. <http://frwebgate.access.gpo. gov/cgi-bin/getdoc.cgi?dbname=gao&docid=f:d04548.pdf> Accessed on 2, December 2005. </li></ul></ul><ul><li>This source is a government report on Data Mining. It reviews the understanding and current use data mining technologies. The report highlights the purposes of data mining efforts in government departments and agencies. Various Departments’ inventories of data mining efforts are provided in this report. Privacy concerns regarding the use data mining technology by the government is given consideration and highlights these concerns in light of exploitation of personal information. The source gave me a good understanding of the privacy concerns from a reliable source. </li></ul><ul><li>American Civil Liberties Union Technology and Liberty Program. Total Information </li></ul><ul><li>Compliance: The TIA’s burden under the Wayne Amendment May 2003. </li></ul><ul><li>< http:// > Accessed on 2, December 2005. </li></ul><ul><li>In this source a new proposed information system called Total Information Awareness (TIA) is being reviewed and it tries to alert the public about the dangers this program would potentially pose to personal privacy. TIA would monitor all transactions made by American in both corporate and government databases around the world. The program would have the capacity to track personal medical records, credits records, shopping patterns, travel arrangements, personal finances, and related information. Data mining techniques would then exploit this data to find patterns that supposedly link to potential terrorist activities. </li></ul><ul><li>Cavoukian Ann. Data Mining: Taking a Stake on Your Privacy Information and </li></ul><ul><li>Privacy Commissioner/Ontario, January 1998 </li></ul><ul><li>http:// </li></ul><ul><li>This document is a reaction to the use of data mining in ways that violates personal privacy rights. Cavoukian is an information and privacy commissioner in Canada and she provides some suggestions on how to address these challenges. The report recommends at least three waivers options to customers when giving out their information; first to state that no data mining be allowed on customer’s data; second, data mining be allowed only for internal use; and lastly, to authorize internal or external use of the data. It helped me to get an idea of what can be done to address the problem. </li></ul>