Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

  • Be the first to comment

  • Be the first to like this


  1. 1. Xin (Sean) Hao 401 Shady Avenue, Kenmawr Apartment B607, Pittsburgh, PA 15206 Tel: (412) 708-4337·Email: EDUCATION Carnegie Mellon University, Pittsburgh, USA May. 2015 Master of Information Systems Management GPA: 3.88/4.33 Sun Yat-sen University, Guangzhou, China June. 2013 Bachelor Degree of Software Engineering GPA: 3.8/4.0 WORK EXPERIENCE ►SingTel, Singapore May - Aug. 2014 Data Analyst (Scientist) Intern, Dataspark Team Datamart Project – Large-scale Data Aggregation, Architecture Designer, Primary Developer  Design and implement a very important computing engine for geo-location data insight product of SingTel DataSpark team.  Aggregate users’ demographic data with telcom-geolocation data and then integrate information based on subzones of map.  Decreased total time spent from 7 hours to 1 hours using Hadoop (mapR and Cloudera distribution) for one-day data (billion records) and saved 300 hours in total for entire data calculation. ►Tencent, Shenzhen, China July - Sept. 2012 Backend Software Engineer Intern, Data Analysis Team DataTrans Project – Data Pipeline, Architecture Designer, Main Developer  Designed and implemented a data migration and processing tool using Hadoop to solve TB level data problem.  Fetched data from MySQL and transferred it to Hadoop file system. Then stored the result in HBase DBMS.  Build the data pipeline among HDFS, MySQL and HBase. Practiced JAVA, Hadoop based programming, HBase and SQL skills in this project. PROJECT EXPERIENCE Twitter Data Analysis Web Server on AWS Platform, Pittsburgh, USA Sept. 2014 – Dec. 2014 Back-end/Hadoop Developer  Analyzed 1TB tweets data using AWS Elastic Map-reduce system about user relationship, retweet, location and hashtags.  Imported data into MySQL and scalable HBase DBMS. Optimized the database to accelerate query response speed.  Built web servers using JBoss framework on Amazon EC2 system with Elastic Load Balance to resist compression test.  With help of partitioning, caching and load balancing techniques, our web servers are able to handle more than 10,000 queries per second and secure 100% correctness. Search Engine Implementation on ClueWeb09 Dataset, Pittsburgh, USA Jan. 2015 -- Present Primary Developer  Design and implement query parser to convert general query strings to search engine operations and terms.  Search in more than 3GB corpus through inverted list and score list using Ranked/Unranked Boolean, BM25 and Indri algorithms to achieve high precision and recall rate document searching in seconds.  Using OO design patterns like Factory Patterns to reuse code and keep tight structure. TwittStory Mobile App, Adelaide, Australia Sept – Dec. 2013 Development Team, Developer  Developed a multiple platform supported web based mobile app for .  Implemented login, search, fetching data and visual display features for this Twitter analysing mobile app.  Practiced HTML, CSS, JavaScript, jQuery, jQuery Mobile, PhoneGap and Cordova. SKILLS Computer Programming Skills JAVA, Python, Android, JavaScript, HTML/CSS, SQL, C/C++ Interested Fields Hadoop, Cloud Platform, Large-scale Data Mining, Machine Learning Language English, Chinese