Ask The Expert

  Tieble
14/1/2019 4:22 PM

Dear stata-clever colleagues and friends,

I am hoping someone might be able to help me with a stata problem (ASAP):

I'm working on a dataset which has multiple observations per id. Let's say I have a dataset with about N=5500 unique id's.
And I have a date variable (datevar) and each id has multiple dates.
How can I count the number of dates per id? And if possible also taking into account when there are duplicate dates within the same id (e.g. lets say the date refers to when a treatment was started and it is possible that several treatments were started on the same day, but I only want to count each unique date per id).
And I do not want to count observations that have missing dates.
I've been experimenting with 
bysort id datevar : gen max=_N
bysort id datevar : gen seq=_n
But they don't seem to give me what I need = a count of the number of different dates per id.
Can anyone help?
Kind regards,
Stine (EPIET cohort 2008 - C14)
  Arnold Bosman
14/1/2019 4:22 PM


I am wondering if anyone has any advice on conducting structured observations in sensitive situations.

I am hoping to conduct, as part or some formative research, structured observations of handwashing within households in the IDP camps of Myanmar. However, concerns have been raised about conducting structured observations within the households here. It is thought that as we are working in camps with conservative Muslim populations it is unlikely that the households will agree to a female enumerator entering to conduct observations since cultural norms do not allow a female to be in the presence of an unknown male.

Suggestions were made around using two female enumerators, however this may also be unacceptable.

 I am wondering if you know of any solutions or ideas around this situation, and what alternative and valid methods are used to measure/assess handwashing in this case?

Thank you

  Tieble
04/1/2019 4:22 PM

Dear epi-friends,

I need your help. I am looking for some input: I am developing a 1-2 day curriculum for primary school aged children of 9-11 years on infectious disease transmission. The module is focussed on (preventing) malaria transmission, but we take it a bit broader to infectious disease transmission and control in general.

I have incorporated the "sneeze game", and will do a small class room game with starch/water/iodine to mimic transmission and prevention. But I am looking for some more online tools suitable for children 9-11yrs. Ideally the tool or game includes some options / decisions that influence the outcome of the game, and that can be discussed in the class room in the feedback session (e.g. what intervention has the largest effect on new case numbers).

Any ideas welcome, and highly appreciated! See you all @ ESCAIDE!

best wishes,

Alma Tostmann, cohort 2009.

(Radboud university medical centre/ The Netherlands)

  Arnold Bosman
4/1/2019 4:22 PM

Dear FEM-Wiki users,

You may have seen that I have posted on various social media platforms (e.g. Twitter, on the EAN site on Facebook and in the closed EPIET / EUPHEM / FETP group on LinkedIn) today (03.05.2016) in other to engage with as many Field Epidemiologists and Microbiologists as possible for a piece that I am working on for the forthcoming EAN (EPIET Alumni Network) Newsletter addressing the question:

Should Field Epidemiologists & Microbiologists be on Twitter or other types of social media as part of their job?

Some of you may have seen the interesting reply from Arnold Bosman in the LinkedIn Group?

I know this forum is meant for epidemiological questions - but I would very much appreciate any input on this question from experts using the FEM-Wiki.

What are your experiences with using social media for work? Do you do it? If yes, how? If no, why not?

I am currently working on my phd and as a freelance epidemiologist in the area of infections among and access to health care & harm reduction services for people who inject drugs. For this work I use Twitter quiet a lot in order to stay updated about news and new developments in my field. For example I use Twitter to get news from other researchers and relevant organisations working in my fields of interest. 

Many general institutions like ECDC, WHO, CDC and most national public health institutes (like PHE, RKI, SSI etc) are active on Twitter. So are also many NGOs, patient groups, scientific journals & academic groups. The good thing about twitter is that you decide who you follow and what type of news should appear in your "News Feed" - so depending on your interests and areas of work you can decide on which institutions and individuals to follow.

Did you check if the leading experts in your area are on Twitter? Or your favourite journal?

Twitter is also great to follow events and engage in them even if you can't attend. Conferences like e.g. ESCAIDE & ECCMID have active Twitter accounts which in my opinion are very useful to follow as a Field Epi og Microbiologist.

Twitter (and other social media platforms!) can also be a great way to promote your work such as your published articles etc.

I would be very interested in hearing how members in this forum use Twitter or other social media platforms for their work.

Many kind regards,

Stine Nielsen

Phd student, EPIET alumni and freelance epidemiologist

  Tieble
4/1/2019 4:22 PM

Dear all,

I've done a web survey and I have selected my controls matched on birthyear with my cases. My problem now is that cases respond to questions on risk factors from a few years back, so the mean age of cases is now lower than for controls. At the same time age is an important risk factor for my disease, but to complicate it even further the relationship between age and outcome of disease is not simply linear, since the risk increases with age and then decreases with very high age. What to do? In my logistic model SAS is now modelling the difference in age between groups in a "wrong" direction, since it will state that younger age is a risk factor for disease (since cases are younger). Which is incorrect. The easisest way would be to exclude the variable age from my model since this variable has been used to select the controls, but is it ok to do that? Or is there a better way to handle it?

Happy for your help!

  Arnold Bosman
4/1/2019 4:22 PM

Hello everyone,

I am currently working at the Caribbean Public Health Agency as a Field Epi and one of the projects I am working on is drafting a proposal for hotel-based surveillance for the country member states. I am having some challenges in figuring out how this might work and would love to get any thoughts/opinions on the subject.

It would be syndromic surveillance in sentinel hotels. The primary objective is to be able to detect outbreaks early in hotels. Since tourism is such a huge part of the economy, the goal is to identify outbreaks early in order to stop them from spreading - in the past outbreaks have led to lawsuits, closures of hotels and loss of business. The four primary syndromes under surveillance would be: Fever and respiratory symptoms (ARI); Gastroenteritis; Undifferentiated Fever; Fever and Rash

The main issue we are having is how the countries would be able to determine if the weekly numbers have reached an outbreak threshold. There is no baseline data and from a thorough literature review, we found limited information on other hotel-based surveillance that has been conducted (only in Jamaica, with no threshold information).

The countries all have limited resources and expertise, so the thresholds can not be something too complicated or in statistical software (e.g., SAS, STATA). Ideally it would be something simple like “3%” of guests, but we do not want to just pick something without some sort of scientific rationale.

I would be appreciative if you have any suggestions on the above and also if you have heard of any hotel-based surveillance being conducted other than in Jamaica.



  Annick
4/1/2019 4:22 PM

Dear All

judging by how successful Lorenzo's last request was on this forum.

I would like to know if anyone has guidelines on how to conduct an outbreak investigation, specifically for a nosocomial outbreak.

Does such a thing exist?

Thanks very much,

Annick (EPIET Cohort 10, currently working for MSF)

  Tieble
4/1/2019 4:22 PM

Ciao Experts, 

given the great and helpful response rate of my previous post, let me use femwiki again.

I am using EPIDATA for simple analysis here in Fiji so that once I leave my colleagues can use a simple and free programme to analyse surveillance data. I have two queries.

1) how do I exclude/include time periods in the analysis? No matter how I specify dates (e.g. "MEANS nos if labregdate="03/01/2014"", or "MEANS nos if labregdate=03/01/2014") it always gives me a mistake

2) how do I recode variables to missing? In our database we had specified to enter the date 01/01/1800 for missing dates (don't ask me why) but now I want to recode that to actual missing since if I leave it like a date from 214 years ago it really screws up the analysis. 

Looking forward to your precious advice!

Grazie mille, 


  lpezzoli
4/1/2019 4:22 PM

Ok, this is a very loooooong shot, but does anyone know where to find updated shape files (GIS) for Fiji islands? I am currently on an assignement here and I would like to plot tuberculosis incidence by administrative areas. Thanks!


  Arnold Bosman
4/1/2019 4:22 PM

what is the unit of study in epidemiology? Can some one please explain it??

  Arnold Bosman
4/1/2019 4:22 PM

I am involved in a nested CC study that involves cases entering a program and looking at failure by six months.  Cases and controls are individually matched on month/year of entry.

A primary risk factor of interest is receiving a medical diagnosis (psychiatric, musculoskeletal, respiratory).  I have the person-time at risk for cases and controls to develop this risk factor.

Now, for background...

The relative risk (RR) is defined as ((N Cases exposed)/(N population exposed))/((N cases unexposed/N population unexposed)).

The incidence rate ratio (IRR)  is definded as ((N Cases exposed)/(Person-time exposed))/((N cases unexposed/Person-time unexposed)).

The odds ratio (OR) is defined as ((N cases exposed)*(N controls unexposed))/((N controls exposed)*(N cases unexposed)).  Under the rare disease assumption the OR approximates the RR.

Can I validly expand on the odds ratio using person-time ((N cases exposed)*(Person-time unexposed))/((N controls exposed)*(Person-time exposed)) to approximate the IRR?

If I can do this, would logistic regression would be appropriate?  What other analytic approaches would you suggest?

Thanks, David

  Mumtaz Ali Laghari
4/1/2019 4:22 PM

Dear collegues,

I am working in the design of a course whose target group is medical staff with experience in the field of medical and humanitarian emergencies from the technical point of view, without enough analysis capacity to decide if an intervention is justified or not. The main objective of the course is to learn how to analyze the data gathered in a Rapid Assessment  after an alarm of a potential ongoing emergency.

We are creating a module for calculation of essential health indicators and their analysis. I would need some advise: I need documentation regarding the practical analysis of the health indicators with examples....Is there any tool with these characteristics???


Thanks a lot


4/1/2019 4:22 PM

Dear all,

Does anyone know of a freeware or online computer tool that you can use to build an epidemic curve? So I mean one you can enter in data, and that will give you the curve, and if possible some extra options like gender, primary case distinction, that would be a great help!


Linda Verhoef

  Mathias Leroy
4/1/2019 4:22 PM

Dear Authoritative expert,


In order to correctly label our weekly Communicable Disease Threat Report (CDTR), we are considering  EPI weeks vs ISO weeks. However I am having trouble identifying and authoritative /original source reference for the definition of epidemiological weeks. Could you please help me?

Best rgds


Pasi Penttinen


  Arnold Bosman
4/1/2019 4:22 PM

what are the best tools for measuring disease burden?

Today, there is great demand for global burden estimates.

Today the principle guiding the burden of disease approach is that the best estimates of incidence, prevalence, and mortality can be generated by carefully analyzing all available sources of information in a country or region, and correcting for bias.

The disability-adjusted life year (DALY), a time-based measure that combined years of life lost due to premature mortality and years of life lost due to time lived in health states less than ideal health, was developed to assess the burden of disease

  Patty Kostkova
4/1/2019 4:22 PM

It seems that here th NEW post button is working ...

  Vladimir Prikazsky
4/1/2019 4:22 PM

testing the forum

  CeRC
4/1/2019 4:22 PM

What is the 'Ask the Expert' forum for?  Who can post? Who will answer?

