Automated text analysis to discover authors gender

Gender

The other day I ran an experiment which involved developing a web application which assessed text and attempted to predict the gender of the author of the text. The application uses a machine learning text classifer which means the system had to be trained to recognise ‘male’/‘female’ text. The system was trained with a corpus of over 10000 text entries split 50/50 male/female.

Simple research, it's the way forward

A better way to conduct research?

How can market research provide better insight for it's clients? It's a question that needs asking and part of the answer could lie in simple research. By simple I do not mean simplistic, dumbed down, quick or even dirty. By simple we're talking things like clear, accessible, focused.

Rather than trying to cover off every research objective in one large project, lets do research in baby steps, into simple mini-projects, with each step providing just enough "insight" to make a difference for whoever it's being conducted for.

AND joins with hook_views_data

SQL statement

hooks_views_data the views hook used to describe tables to views. The views documentation it makes it pretty clear how to create new view fields which relate to a base table e.g. Nodes.

However it gets a little hazy when it comes to defining fields which require a join which meets more than one condition e.g. 'LEFT JOIN WHERE x=a AND y=b'. You need this sort of join if you want to pull data related to a individual value from a CCK field which relates to a value in a another table.

Tertiary influencers in social network example

Next tertiary influencers

Check it out the picture above (click on it to make it bigger). It's part of a side project I've been messing about with which highlights who of your followers/fans/etc, follow. The idea is that you can see who your followers or your competitors followers, follow on a social network.

By itself this is interesting but it really just forms a foundation for further research. You can use this sort of analysis as a starting point to:

Twitter - GEXF file for social graphing

Twitter follower graph

Gephi is awesome, Twitter is awesome. But I couldn't find any Twitter data in a file format suitable for Gephi. So I ran a little script which pulled back my followers and my followers followers, their followers followers followers on so on. Left the script running on a cron for a bit. Cleaned up the data and spat it out in GEXF so we can have a play in Gephi. Feel free to download the file below for use in Gephi.

Everytime you produce a bad graph a fairy dies

Bad graph


Whilst it's unconfirmed that the production of awful charts leads to the death of small mythical beings, I'm pretty sure bad graphs are killing something which is at least as important as fairies.


I believe bad graphs kill ideas.

Getting researchers to work more creatively

mad scientist

The the dawn of the credit crunch market researcher's were asked to think "outside the box, inside the budget". A wonderful concept if ever I've heard one. The only problem is, how do you get market researchers working creatively (often we're not the most creative bunch).

Notes concerning the creation of infographics

Napoleons march
  1. Objective = (Audience + Data) * Analysis

    Before you even but pen to paper, even think about double clicking the PowerPoint icon. You need to figure out a fewer things.

    • Who are your audience?
    • What is your data saying?
    • Why is your data important to your audience?

    When you've answered these questions you should have a pretty good idea about what you want your diagram to say and how you want to say it to.

Update: Nightingale's rose

A version of Nightingales rose diagram

Has mentioned in my previous post the way I had looked at reproducing Nightingales old diagram was somewhat flawed. The code below produces rose diagrams in Processing correctly.

A couple of things to remember if your just starting to use processing:

Recreating Nightingale's rose diagram

"Rose" diagram

Data visualisation has been my pursuit of choice over the past few weeks, this has involved a lot reading, trying to teach my self how to draw and eating of cake. One of the most interesting stories about graphs I've come across, involves and the diagrams she used to highlight the adverse effect of unhygienic conditions in the army.