2. 1. Introduction
2. Objectives
a. Processes
b. Results
c. Other Contributions
3. Additional Items
4. One Ford Behaviors
5. Questions
2
3. About Me:
Senior in Oakland University
Major: Information Technology
Expected Graduation: April 2016
Extracurricular:
President of OU’s Nachale! Bollywood Dance Team
Member of OU’s ΑΦΩ Service Fraternity
Interesting Fact: Favorite Car – Mustang GT
3
4. FORD Summer Internship:
Department Name: Application Development
Hadoop CoE team
Duration: 12 weeks
Main objectives:
① Develop Pig & Hive scripts for the MS&S Clickstream Project
② Develop Proof of Concept applications for Hadoop
③ Recreate a simpler version of an existing DataStage CoE SharePoint site
4
5. 5
FORD Experience: Skills Applied
Problem Solving
Analytical
Programming and Design
Team Player
Communication
6. 6
General Skills
Hadoop concept and architecture
Customer relationships
Managing scope
Managing multiple tasks and projects
FORD Experience: Skills Developed
Technical Skills
Apache Hadoop, Pig, & Hive
Microsoft SharePoint
Microsoft Outlook
Cisco WebEx
7. 1. Develop Pig & Hive scripts for the MS&S Clickstream Project
BACKGROUND
i. What is clickstream?
ii. Where does the clickstream data for this project come from?
iii. How is this clickstream data used?
7
8. 1. Develop Pig & Hive scripts for the MS&S Clickstream Project
PROCESS
i. Use Hortonworks Training materials to prepare
ii. Simultaneously work on the assignment
iii. Have weekly meetings with Nick Matziuk for guidance
8
9. 1. Develop Pig & Hive scripts for the MS&S Clickstream Project
TECHNICAL PROCESS
Clickstream fields
9
15. 1. Develop Pig & Hive scripts for the MS&S Clickstream Project
RESULTS
1. Hive table called “clickstream” has been created
2. Data has been successfully stored in the “clickstream” table
15
16. 2. Develop Proof of Concept applications for Hadoop
PROCESS
i. Compute contextual ngrams using Twitter data
ii. Analyze the data and form general conclusions
16
17. 2. Develop Proof of Concept applications for Hadoop
TECHNICAL PROCESS
17
18. 2. Develop Proof of Concept applications for Hadoop
RESULTS
18
String most frequently used: “Mustang”
19. 2. Develop Proof of Concept applications for Hadoop
TECHNICAL PROCESS
19
20. 2. Develop Proof of Concept applications for Hadoop
RESULTS
20
String most frequently used: “Henry”
21. 3. Recreate a simpler version of an existing DataStage CoE SharePoint site
PROCESS
i. Complete SharePoint training.
ii. Understand the requirements of the new SharePoint site being requested
iii. Outline the structure of the new SharePoint site
iv. Migrate contents of old site into the new site
v. Implement the new site
vi. Test and debug
21
22. 3. Recreate a simpler version of an existing DataStage CoE SharePoint site
MAJOR CHANGES
i. Fewer permission groups
ii. Re-created the “How Do I…” FAQ page
iii. Proposed the best way to maintain BIDS bulkmails
iv. Redirect Hadoop and DataStage Consulting Requests to BI System Design Consulting Requests
22
25. 3. Recreate a simpler version of an existing DataStage CoE SharePoint site
RESULTS
Simpler permission structure
Clean layout
Smooth Navigation
Currently in use by the DataStage CoE
25
30. Extracurriculars at Ford:
30
• Summer Intern Ride & Drive
• IT Intern End of Summer Dinner
• OU Intern/FCG Networking Lunches
• Participated in team lunches
• Achieved up to Digi-Worker Level 1
• Lunch with Steve Balaj
• One-on-one with Felicia Fields
• Emotional Intelligence Lunch & Learn hosted by Dr. Alan Fisk
31. 31
Future Goals:
Acquire an FCG position
o Part of the Hadoop CoE
Masters in Business Administration
Certification in Graphic Design
32. 32
1. Foster Functional and Technical Excellence
– Demonstrated functional and technical excellence
by developing Pig & Hive scripts
2. Own Working Together
– Built strong relationships with team members
33. 33
3. Role Model Ford Values
– Have a can do, find a way attitude by finding a solution for a
SharePoint permissions issue
4. Deliver Results
– Developed compelling and comprehensive plans, while keeping
and enterprise view by creating proposal and requirement
documents for BIDS processes that required change
These are some skills I have acquired through my education in OU and my past experience in DTE Energy.
MS&S: Marketing Sales & Service
What is clickstream?
A clickstream is the recording of the parts of computer screen the user clicks on while browsing a website or an application. So for example, clickstream data includes the websites one has visited, the videos one might’ve watched on youtube, or the type of links most accessed by a particular user on a particular website.
2. Where does the clickstream data for this project come from?
The clickstream data that has been used for this project is the recorded data of the user interactions with the Ford and Lincoln websites.
3. How is this clickstream data used?
This collected data will actually be analyzed by data analysts to give insights into customers interest in Ford products. This helps Ford with improving its business and customer satisfaction.
The objective is to load data into a Hive table using Pig and Hive scripts so the data could be analyzed for the Clickstream project.
Some essential actions I have taken in order to complete this particular objective are:
This is a sample of the fields the Hive table will be made up of.
Creates the Hive table called “clickstream” with all the 554 fields, in the Hive database I’ve created for my personal use, called pdaram.
This pig script has three main functions:
1. It first loads all the data from the tab delimited data file. Since the data file is tab delimited, I had to use a custom pig loader, called the Escaped Text Data Loader.
2. I have defined a few more fields to be added into the Hive table. The first field records the complete file name, and this is the logic for that. The second field records only the base name, and I use the Substring and IndexOf functions of Pig in order to extract the base name out of the complete file name. The third field records the date and time the data was loaded into the table, and I use the CurrentTime funtion of Pig in order to record the current time stamp.
3. Lastly, I use the STORE function in order to store the new fields with its data into the clickstream table on Hive.
The POC would be on using features of Hive to analyze data after it has been loaded. Specifically, the feature used to count frequency of ngrams in the data.
One Hive query to compute contextual ngram. Looking for words most used in close proximity to the word Ford.
The string that most frequently appears in close proximity to the word “Ford” is the string “Mustang”. According to this data, it looks like the Mustang is the Ford car that is most talked about on Twitter.
Another Hive query to compute contextual ngram. Looking for words most used in close proximity to the word Ford, this time the words before the word Ford.
Results of computing a contextual ngram to find the most frequent words used before the word “Ford”, we find that, Henry Ford, the founder of Ford Motor Company, is the most talked about person with the last name Ford on Twitter. The second most talked person is the Hollywood actor, Harrison Ford.
Analysis: This data shows us that Twitter users tend to pay more attention to people of generally a higher importance, rather it is importance in history or in today’s pop culture. This analysis is supported even further by the fact that the third most talked person on Twitter with the last name Ford is Tom Ford, who is a popular American fashion designer and film director.
This just gives some highlights of the major changes I have made to the new Sharepoint site.
1.
2.
3. Drafted a proposal document which proposes a new and more efficient way to manage all bulk mails on one universal page where all BIDS CoE teams can get to. Any user interested in subscribing to a particular CoE’s bulk mail would be directed to this particular page as well.
4. Worked with Gayathri and Al Brouilette in order to figure out the best method to automate the BIDS Sandbox Access Request Process. We have gathered a few different options and the decision has yet to be made on how to implement this particular objective.
5. Worked with Deepika, Brian Kazmier, and Al in order to modify the BISD Engagement Request form so that the BI System Design team can act as the SPOC to the customer. The main goal of this objective is to modify the process of the already existing BISD request system so that it acts as an umbrella for all types of BIDS request to be collected, and these requests can then be redirected to the appropriate CoE teams. Also, another requirement for the change is to make this entire process as automated as possible, which involves creating certain workflows. The GUI has been modified, and all that is left to do is to create the workflows and the process will be set and ready to use.
Have been the publisher of all DataStage and Hadoop documents for the month of August. I have sent out weekly emails to the DataStage and Hadoop bulkmails in order to keep people updated with the new information.
Have participated in every weekly CoE meetings throughout my internship. I have also had the opportunity to lead a couple of the CoE meetings and was also able to be a Scribe to one of those as well.
Have participated in most of the Standards Benchmarking meetings we had with IBM. Was part of the review process for one of the Standards documents as well.
(3-4), specific areas interested in
Planning to be back at Ford as an FCG in an effort to start off my career on one of the best paths.
Hoping to work with the Hadoop CoE team again as well because I feel like I’ve gotten a great start and with more time, I could really dive into subject even deeper and get the complete knowledge.
2. Planning to get a Masters in Business Administration in the next few years.
3. I would like to get certified as a Graphic Design Professional, as that is and always has been one of my passions.
1. Demonstrated and built functional and technical excellence by properly applying what I have learned in the Hortonworks Hadoop Training in order to develop Pig and Hive scripts to load data into a Hive table. I have also used the knowledge I have gained to form basic analysis on Twitter data as part of my POC using the contextual ngrams function in Hive.
2. Built strong relationships with team members. These relationships have proven to be my greatest strengths for this internship because I was able to gain great knowledge and insight from my team members. Everyone of them were willing to help when I needed it. For example, Al Brouilette, although he isn’t in my immediate team, he has spent a great deal of his time and effort in order to provide his help with creating workflows for the BISD Engagement Request Form or providing tips on the best route to take in order to automate the BIDS Sandbox Access Request Process.
3. I have shown a can do, find a way attitude. For example, one of the main changes for the new DataStage SharePoint site that John has requested is the change of the permissions for the Discussion Board SharePoint Group. We realized the permissions for that group was inappropriate when a member of that group accidentally deleted all entries of the Discussion Board. I was determined to find a way to resolve this issue and so I’ve done some research and consulted with Al Brouillette on his thoughts on the issue. After I’ve consulted with Al, I’ve come to realize that there, in fact, is a method to create custom permission level for a group. Using this information, I was able to change the permission level of the Discussion Board Group. Previously, the users of the group were able to add, modify, and delete items. With the new custom permission level, the Discussion Board user may only add and modify their own entries. The cannot delete any entries. This solution will protect from any accidents such as this in the future.
4. Developed compelling and comprehensive plans , while keeping an enterprise view when I have drafted a couple of proposal and requirement documents for certain BIDS processes that needed to be changed . For example, I have created a proposal document to propose a change in the BISD Engagement Request form in an effort to make the BISD team the SPOC for the customer.
Questions to think about:
How will Ford be able to use the work you have done here?
Think about some criticism for Ford. Things Ford can improve upon. The team can improve upon.
What is one thing you liked better at DTE that could be improved at Ford?
If I could change one thing about Ford, what would it be?
Some things Ford can improve: The only real complain I have is that I didn’t like cube arrangement as much. If team rooms had windows, or were better arranged a person cube didn’t face the wall, it would make it easier for a person to sit and work there for 8 hours.
If interns could have option to work from home once in a while, I think that would be a great addition. I think that would, in fact, better prepare an intern for the real life job at Ford because there are generally higher expectations and higher supervision by the supervisor when an employee is working from home and it would’ve been great if I could’ve experienced that a couple of times this internship.
What was one of the biggest challenges?
One of my biggest challenges was definitely managing my time wisely and multi tasking. This job called for a lot of that and I am glad it did. I was given many tasks to complete at once all with deadlines. I had to learn how to juggle between tasks and learn how to prioritize in the best way possible that would get the most work completed on time.
What is the most interesting thing I worked on at Ford and why?
The most interesting thing I have worked on at Ford is definitely the Pig & Hive scripts. Programming has always been of interest to me mainly due to the instant results I get. That’s really rewarding.