This presentation talks about Natural Language Processing using Java. At Museaic, a music intelligence platform, we spent time figuring out how to extract central themes from song lyrics. In this talk, I will cover some of the tasks involved in natural language processing such as named entity recognition, word sense disambiguation and concept/theme extraction. I will also cover libraries available in java such as stanford-nlp, dbpedia-spotlight and graph approaches using WordNet and semantic databases. This talk would help people understand text processing beyond simple keyword approaches and provide them with some of the best techniques/libraries for it in the Java world.
Sentiment analysis over Twitter offers organisations and individuals a fast and effective way to monitor the publics' feelings towards them and their competitors. To assess the performance of sentiment analysis methods over Twitter a small set of evaluation datasets have been released in the last few years. In this paper we present an overview of eight publicly available and manually annotated evaluation datasets for Twitter sentiment analysis. Based on this review, we show that a common limitation of most of these datasets, when assessing sentiment analysis at target (entity) level, is the lack of distinctive sentiment annotations among the tweets and the entities contained in them. For example, the tweet ``I love iPhone, but I hate iPad'' can be annotated with a mixed sentiment label, but the entity iPhone within this tweet should be annotated with a positive sentiment label. Aiming to overcome this limitation, and to complement current evaluation datasets, we present STS-Gold, a new evaluation dataset where tweets and targets (entities) are annotated individually and therefore may present different sentiment labels. This paper also provides a comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity. We also investigate the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets.
In recent times, research activities in the areas of Opinion and Sentiment analysis in natural language texts and other media are gaining ground under the umbrella of subjectivity analysis. The reason may be the huge amount of available text data in the Social Web in the forms of news, reviews, blogs, chats and even twitter. Though Sentiment analysis from natural lan-guage text is a multifaceted and multidisciplinary problem, in general, the term “sentiment” is used in reference to the automatic analysis of evaluative text.
Seminar presentation made by me for the topic of 'Resources for Sentiment Analysis' at IIT Bombay. Includes a set of bonus slides for additional information which was not actually presented.
This presentation talks about Natural Language Processing using Java. At Museaic, a music intelligence platform, we spent time figuring out how to extract central themes from song lyrics. In this talk, I will cover some of the tasks involved in natural language processing such as named entity recognition, word sense disambiguation and concept/theme extraction. I will also cover libraries available in java such as stanford-nlp, dbpedia-spotlight and graph approaches using WordNet and semantic databases. This talk would help people understand text processing beyond simple keyword approaches and provide them with some of the best techniques/libraries for it in the Java world.
Sentiment analysis over Twitter offers organisations and individuals a fast and effective way to monitor the publics' feelings towards them and their competitors. To assess the performance of sentiment analysis methods over Twitter a small set of evaluation datasets have been released in the last few years. In this paper we present an overview of eight publicly available and manually annotated evaluation datasets for Twitter sentiment analysis. Based on this review, we show that a common limitation of most of these datasets, when assessing sentiment analysis at target (entity) level, is the lack of distinctive sentiment annotations among the tweets and the entities contained in them. For example, the tweet ``I love iPhone, but I hate iPad'' can be annotated with a mixed sentiment label, but the entity iPhone within this tweet should be annotated with a positive sentiment label. Aiming to overcome this limitation, and to complement current evaluation datasets, we present STS-Gold, a new evaluation dataset where tweets and targets (entities) are annotated individually and therefore may present different sentiment labels. This paper also provides a comparative study of the various datasets along several dimensions including: total number of tweets, vocabulary size and sparsity. We also investigate the pair-wise correlation among these dimensions as well as their correlations to the sentiment classification performance on different datasets.
In recent times, research activities in the areas of Opinion and Sentiment analysis in natural language texts and other media are gaining ground under the umbrella of subjectivity analysis. The reason may be the huge amount of available text data in the Social Web in the forms of news, reviews, blogs, chats and even twitter. Though Sentiment analysis from natural lan-guage text is a multifaceted and multidisciplinary problem, in general, the term “sentiment” is used in reference to the automatic analysis of evaluative text.
Seminar presentation made by me for the topic of 'Resources for Sentiment Analysis' at IIT Bombay. Includes a set of bonus slides for additional information which was not actually presented.
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
SentiTweet is a sentiment analysis tool for identifying the sentiment of the tweets as positive, negative and neutral.SentiTweet comes to rescue to find the sentiment of a single tweet or a set of tweets. Not only that it also enables you to find out the sentiment of the entire tweet or specific phrases of the tweet.
https://www.youtube.com/watch?v=nvlHJgRE3pU
Won ITAC Graduation Projects Competition, ITAC ID: GP2015.R10.75
A web application that analyze big volumes of product reviews, social networks posts and tweets related to a given product. Then, present these results of this big data analytical job in a user friendly, understandable, and easily interpreted manner that can be used by different customers for different purposes.
Technologies used:
1- Hadoop
2- Hadoop Streaming
3- R Statistical
4- PHP
5- Google Charts API
Sentiment analysis of Twitter data using pythonHetu Bhavsar
Twitter is a popular social networking website where users posts and interact with messages known as “tweets”. To automate the analysis of such data, the area of Sentiment Analysis has emerged. It aims at identifying opinionative data in the Web and classifying them according to their polarity, i.e., whether they carry a positive or negative connotation. We will attempt to conduct sentiment analysis on “tweets” using various different machine learning algorithms.
Solr is an open source, widely used, popular IR machine. It can be used for simple sentiment analysis and sentiment retrieval tool. Its multi-language analyzers together with UIMA (Unstructured Information Management Architecture) framework can be extended for sentiment extraction. Each sentence passes through a series of pluggable annotators. Entity and its associated polarity are detected for each sentence. Polarity of each sentence is stored into Solr index. Persistent model files can be created from training data and accessed at run time.
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
This presentation about Hadoop will help you learn the basics of Hadoop and its components. First, you will see what is Big Data and the significant challenges in it. Then, you will understand how Hadoop solved those challenges. You will have a glance at the History of Hadoop, what is Hadoop, the different companies using Hadoop, the applications of Hadoop in different companies, etc. Finally, you will learn the three essential components of Hadoop – HDFS, MapReduce, and YARN, along with their architecture. Now, let us get started with Introduction to Hadoop.
Below topics are explained in this Hadoop presentation:
1. Big Data and its challenges
2. Hadoop as a solution
3. History of Hadoop
4. What is Hadoop
5. Applications of Hadoop
6. Components of Hadoop
7. Hadoop Distributed File System
8. Hadoop MapReduce
9. Hadoop YARN
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/introduction-to-big-data-and-hadoop-certification-training.
Big Data with Hadoop and HDInsight. This is an intro to the technology. If you are new to BigData or just heard of it. This presentation help you to know just little bit more about the technology.
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
SentiTweet is a sentiment analysis tool for identifying the sentiment of the tweets as positive, negative and neutral.SentiTweet comes to rescue to find the sentiment of a single tweet or a set of tweets. Not only that it also enables you to find out the sentiment of the entire tweet or specific phrases of the tweet.
https://www.youtube.com/watch?v=nvlHJgRE3pU
Won ITAC Graduation Projects Competition, ITAC ID: GP2015.R10.75
A web application that analyze big volumes of product reviews, social networks posts and tweets related to a given product. Then, present these results of this big data analytical job in a user friendly, understandable, and easily interpreted manner that can be used by different customers for different purposes.
Technologies used:
1- Hadoop
2- Hadoop Streaming
3- R Statistical
4- PHP
5- Google Charts API
Sentiment analysis of Twitter data using pythonHetu Bhavsar
Twitter is a popular social networking website where users posts and interact with messages known as “tweets”. To automate the analysis of such data, the area of Sentiment Analysis has emerged. It aims at identifying opinionative data in the Web and classifying them according to their polarity, i.e., whether they carry a positive or negative connotation. We will attempt to conduct sentiment analysis on “tweets” using various different machine learning algorithms.
Solr is an open source, widely used, popular IR machine. It can be used for simple sentiment analysis and sentiment retrieval tool. Its multi-language analyzers together with UIMA (Unstructured Information Management Architecture) framework can be extended for sentiment extraction. Each sentence passes through a series of pluggable annotators. Entity and its associated polarity are detected for each sentence. Polarity of each sentence is stored into Solr index. Persistent model files can be created from training data and accessed at run time.
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
This presentation about Hadoop will help you learn the basics of Hadoop and its components. First, you will see what is Big Data and the significant challenges in it. Then, you will understand how Hadoop solved those challenges. You will have a glance at the History of Hadoop, what is Hadoop, the different companies using Hadoop, the applications of Hadoop in different companies, etc. Finally, you will learn the three essential components of Hadoop – HDFS, MapReduce, and YARN, along with their architecture. Now, let us get started with Introduction to Hadoop.
Below topics are explained in this Hadoop presentation:
1. Big Data and its challenges
2. Hadoop as a solution
3. History of Hadoop
4. What is Hadoop
5. Applications of Hadoop
6. Components of Hadoop
7. Hadoop Distributed File System
8. Hadoop MapReduce
9. Hadoop YARN
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/introduction-to-big-data-and-hadoop-certification-training.
Big Data with Hadoop and HDInsight. This is an intro to the technology. If you are new to BigData or just heard of it. This presentation help you to know just little bit more about the technology.
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
Do terms like "Data Lake" confuse you? You’re not alone. With all of the technology buzzwords flying around today, it can become a task to keep up with and clearly understand each of them. However a data lake is definitely something to dedicate the time to understand. Leveraging data lake technology, companies are finally able to keep all of their disparate information and streams of data in one secure location ready for consumption at any time – this includes structured, unstructured, and semi-structured data. For more information on our Big Data Consulting Services, don’t hesitate to visit us online at: http://bit.ly/2fvV5rR
Big Data with IOT approach and trends with case studySharjeel Imtiaz
The Big Data with IOT approach and trends. It will give you complete exposure of data science process and also will give insight how the step by step data science process explore the big data of TripAdvisor case study.
What is Big Data and why it is required and needed for the organization those who really need and generating huge amount of data and when it will be use
Over 90% of today’s data has been generated in the last two years, and growth rates continue to climb. In this session, we’ll step through challenges and best practices with data capturing, how to derive meaningful insights to help predict the future, and common pitfalls in data analysis.
Come discover how integrated solutions involving Amazon S3, AWS Glue, Amazon Redshift, Amazon Athena, Amazon EMR, Amazon Kinesis, and Amazon Machine Learning/Deep Learning result in effective data systems for data scientists and business users, alike.
BloomReach CEO Raj De Datta explains how Big Data Applications (BDAs) are:
1) Delighting customers
2) Improving the bottom line
Examples include LinkedIn, Spotify, Pandora, Google, Splunk and BloomReach.
This Presentation gives an insight into what is big data, data analytics, difference between big data and data science.And also salary trends in big data analytics.
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
5 Things that Make Hadoop a Game Changer
Webinar by Elliott Cordo, Caserta Concepts
There is much hype and mystery surrounding Hadoop's role in analytic architecture. In this webinar, Elliott presented, in detail, the services and concepts that makes Hadoop a truly unique solution - a game changer for the enterprise. He talked about the real benefits of a distributed file system, the multi workload processing capabilities enabled by YARN, and the 3 other important things you need to know about Hadoop.
To access the recorded webinar, visit the event site: https://www.brighttalk.com/webcast/9061/131029
For more information the services and solutions that Caserta Concepts offers, please visit http://casertaconcepts.com/
Content-basedlanguage learning
A. RAHIMI
What is cbi?
CBI is designed to provide second-language learners instruction in content and language
What are the benefits of cbi?
Learners explore interesting content & are engaged in appropriate language-dependent activities. Learning language becomes automatic.
CBI supports contextualized learning; learners are taught useful language that is embedded within relevant discourse contexts rather than as isolated language fragments
Complex information is delivered through real life context for the students to grasp well & leads to intrinsic motivation.
In CBI information is reiterated by strategically delivering information at right time & situation compelling the students to learn out of passion.
Greater flexibility & adaptability in the curriculum can be deployed as per the student's interest.
It gives hands on experience to the learner.
DEMONSTRATION
Intermediate class
Preparing for general English
First session for vocabulary
Buying an airline ticket
I'd like to reserve two seats to New York.
Will that be one way or round trip?
It's $819. Will you pay by check or by credit card?
Here's my Visa Card. Can we get an aisle seat please?
You can choose your seat when you check in.
Vocabularies related to air travel
Vocabularies related to air travel
Getting your luggage
At which carrousel will our luggage be?
Great! I'll get a cart right away.
Be sure you have your luggage ticket.
-Anything to declare?
-No, there's nothing to declare / Nothing to declare
Traveling by sea
We're going across to France by/on the ferry.
We’re leaving for a cruise across Europe.
Vocabularies associated with ships
Bow: The front of the ship.
Stern or Aft: The rear of the ship.
Port: The left side of the ship when facing the bow.
Starboard: The right side of the ship when toward the bow.
Decks: Floors of the ship.
Galley: Where food is prepared; the ship's kitchen. Larger vessels may have more than one.
Muster Station: The designated meeting spot for passengers during emergencies or evacuations. Your muster station will be noted in your cabin.
Cabin or Stateroom: Your room or sleeping quarters on board.
Gangway: The entrance / exit area of the ship used while docked, typically on a lower deck.
Traveling by car
Where is the parking lot, please?
Where can I park my car?
Can I park my car here?
Where can I rent a car?
I would like to rent a car for.... days / weeks.
The car costs £30 a day to rent, but you get unlimited mileage (= no charge for the miles traveled)
I had a breakdown (= my car stopped working) in the middle of the road
The car's still at the garage getting fixed.Where can I find a garage to repair my car?
I'll need to take out extra car insurance for another driver.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
1. Big Data + Sentiment
Analysis = Awesome
Adel Rahimi
Sharif University of Technology
2. TABLE OF CONTENT
• Introduction to big data and its
usage
• Sentiment analysis and its use in
NLP
• How to big data?!
• Tools to use
• Further study
5. WHAT IS BIG DATA?
• Big data is a term denoting
the storage and usage of vast
amount of data, either
structured or unstructured, on
the cloud.
6. USAGES OF BIG DATA
• Internet Search
• Finance
• Business Informatics
7. SPECIFICATIONS OF BIG DATA
• Volume: big data doesn't sample; it just
observes and tracks what happens
• Velocity: big data is often available in real-time
• Variety: big data draws from text, images, audio,
video; plus it completes missing pieces through
data fusion
• Machine learning: big data often doesn't ask
why and simply detects patterns
• Digital footprint: big data is often a cost-free
byproduct of digital interaction
8. COMPANIES WHO USE BIG DATA
• eBay.com uses two data warehouses at 7.5 petabytes and 40PB as well as a 40PB
Hadoop cluster for search, consumer recommendations, and merchandising.
• Amazon.com handles millions of back-end operations every day, as well as
queries from more than half a million third-party sellers. The core technology
that keeps Amazon running is Linux-based and as of 2005 they had the world's
three largest Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB.
• Facebook handles 50 billion photos from its user base.
• Google was handling roughly 100 billion searches per month as of August 2012.
• Oracle NoSQL Database has been tested to past the 1M ops/sec mark with 8
shards and proceeded to hit 1.2M ops/sec with 10 shards.
10. APPLICATIONS OF BIG DATA (II)
• Fashion Trends 2016: Google Data
Shows What Shoppers Want
In April, searches for bomber jackets grew
297% YoY in the U.K. and 612% YoY in the
U.S.
11. APPLICATIONS OF BIG DATA (III)
• IN CASE YOU WERE WONDERING WHAT EXACTLY IS
“BOMBER JACKET”!
13. ADVANTAGES OF BIG DATA
• Cheap and mass storage
• Faster processors
• Cheap open source platforms such as 'Hadoop’
• Cloud computing is a huge advancement in the field
when dealing with Big Data
• Parallel processing, large grid environments and high
connectivity
14. HOW WILL BIG DATA HELP US?
• Predict what customers want before they ask for it
• Get customers excited about their own data
• Improve customer service interactions
• Identify customer pain points and solve them
• Reduce health care costs and improve treatment
20. PREPROCESSING
• Removing non-English Tweets
• Replacing Emoticons by their polarity
• Remove URL, Target Mentions, Hashtags, Numbers
• Replace Negative Mentions
• Replace Sequence of Repeated Characters eg.
‘coooooooool’ by ‘coool’
• Remove Nouns and Prepositions
21. EXAMPLE OF TWITTER
SENTIMENT ANALYSIS
@BonksMullet @chet_sellers This is very accurate and hilarious. Well
done :)
tweet
accurate#1 conforming exactly or almost exactly to fact or to a standard
or performing with total accuracy; "an accurate reproduction"; "the
accounting was accurate"; "accurate measurements"; "an accurate scale"
synset
WSD
SentiWordNet
Pos_score Neg_score Obj_score
0.5 0 0.5
score
23. WORDNET
Is a dictionary-like database of English which has
all the words and their synonyms.
The Persian equivalent of wordnet is Farsnet
available at Shahid Beheshti University.
http://dadegan.ir/catalog/farsnet
25. AFINN
• AFINN list of English words which are rated by their
sentiment, from -5 (negative) to +5 (positive).
• AFINN-111 contains 2477 words.
• Examples:
Abilities 2
Ability 2
Aboard 1
Absentee -1
29. APACHE HADOOP
• Hadoop uses MapReduce algorithm for stream
processing which is extremely fast and reliable.
30. APACHE SPARK
• Apache Spark is a fast and general engine for big data
processing, with built-in modules for streaming, SQL,
machine learning and graph processing.
31. APACHE CASSANDRA
• The Apache Cassandra database is the right choice
when you need scalability and high availability without
compromising performance.
32. REFERENCES AND FURTHER
STUDY
• What Is Big Data? | SAS. (n.d.). Retrieved from
http://www.sas.com/en_us
• 5 ways companies are using big data to help their
customers | VentureBeat | Business | by
brianabillingham. (n.d.). Retrieved from
http://venturebeat.com/2014/04/21/5-ways-big-data-
is-helping-companies-help-their-customers/
• http://sentiwordnet.isti.cnr.it/
• SentiWordNet 3.0: An Enhanced Lexical Resource for
Sentiment Analysis and Opinion Mining
• https://github.com/linkTDP/BigDataAnalysis_TweetSen
timent
33. REFERENCES AND FURTHER
STUDY
• AFFIN-111 -
http://www2.imm.dtu.dk/pubdb/views/publication_det
ails.php?id=6010
• Reviews ClassificationUsing SentiWordNet Lexicon -
http://www.academia.edu/1336655/Reviews_Classificat
ion_Using_SentiWordNet_Lexicon
• Using SentiWordNet and Sentiment Analysis for
Detecting Radical Content on Web Forums -
http://www.jeremyellman.com/jeremy_unn/pdfs/1_____
Chalothorn_Ellman_SKIMA_2012.pdf
• From tweets to polls: Linking text sentiment to public
opinion time series -
http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/
paper/viewFile/1536/1842
Editor's Notes
It allows business to track:
- Flame detection (bad rants)
- New product perception
- Brand perception
- Reputation management
Identifying child-suitability of videos based on comments
Bias identification in news sources
Identifying (in)appropriate content for ad placement
Question: “Why aren't consumers buying our laptop?”
We know the concrete data: price, specs, competition, etc.
We want to know subjective data: “the design is tacky,” “customer service was condescending”
Misperceptions are also important, e.g. “updated drivers aren't available” (even though they are)