3. A new look
• Based on research
Thank you to those who
participated!
• Improved Data Lake
messaging and focus
• Clear customer
advantages
• New Getting Started
section for newbies
• Higher level home page
• More detailed About
page
Virtual Ribbon Cutting - HPCCSystems.com 3
4. New languages for a growing international audience
4
Simplified
Chinese
Spanish
Brazilian Portuguese
5. New overview video
Virtual Ribbon Cutting - HPCCSystems.com 5
https://www.youtube.com/watch?v=FDuCuDRy1
wU
6. Very special call out to the design team
• Arjuna Chala
• Bob Foreman
• Chris Lo
• Dan Camper
• Flavio Villanustre
• Jeremy Clements
• Jim DeFabia
Virtual Ribbon Cutting - HPCCSystems.com 6
• Lili Xu
• Richard Taylor
• Roger Dev
• Sharon Vanairsdale
• Stuart Ort
• Trish McCall
7. Very special call out to the translations team
• Yun Chen
Virtual Ribbon Cutting - HPCCSystems.com 7
• Lin Guo
• Rodrigo
Pastrana
• Joffre Jatem
• Hugo Watanuki
9. View this presentation on
YouTube:
https://www.youtube.com/w
atch?v=Z1A3nOuhv3A&list
=PL-8MJMUpp8IKH5-
d56az56t52YccleX5h&inde
x=11&t=43s (1:36:33)
Editor's Notes
We are living in the era of the big data. We have a lot of data from various domains and different sectors. For example we have medical, political, industrial and financial datasets.
In order to find the trends or to discover the hidden structure we need to preprocess the data first.
The text cleaning is an crucial technique in any data mining or NLP tasks.
One important step in the text cleaning is the stop word removal or what we called the stop word reduction which eliminate the noise words that are irrelevant to context or not predictive because they carry low info content so we don’t need these words. We need to eliminate them.
By eliminating these words we will save a huge a mount of space in text indexing.
Most of researchers use the standard stopword list is used to remove the words that carry low information content, these words are general it’s applied to any dataset regardless the domain.