Discussion:
getting the confidence interval
(too old to reply)
Werner LEMBERG
2018-10-12 08:40:22 UTC
Permalink
Folks,


I would like to get a 95% confidence interval so that I could use it
in AGGREGATE, e.g.,

AGGREGATE OUTFILE * MODE ADDVARIABLES
/BREAK=...
/Mean = mean(V)
/CI = ci(V, 0.95)

What must I do to get the result of my hypothetical `ci' function?
I'm a PSPP novice, so maybe there is a better solution than AGGREGATE
– what I ultimately want is to emit the confidence interval of a
variable to a CSV file using SAVE TRANSLATE.


Werner
John Darrington
2018-10-12 10:29:00 UTC
Permalink
The confidence interval is a concept associated with a hypothesis.
If it's the confidence interval on the test for a mean value, typically you
would get that by using a T-Test.


On Fri, Oct 12, 2018 at 10:40:22AM +0200, Werner LEMBERG wrote:

Folks,


I would like to get a 95% confidence interval so that I could use it
in AGGREGATE, e.g.,

AGGREGATE OUTFILE * MODE ADDVARIABLES
/BREAK=...
/Mean = mean(V)
/CI = ci(V, 0.95)

What must I do to get the result of my hypothetical `ci' function?
I'm a PSPP novice, so maybe there is a better solution than AGGREGATE
??? what I ultimately want is to emit the confidence interval of a
variable to a CSV file using SAVE TRANSLATE.


Werner
_______________________________________________
Pspp-users mailing list
Pspp-***@gnu.org
https://lists.gnu.org/mailman/listinfo/pspp-users
--
Avoid eavesdropping. Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
Mark Hancock
2018-10-12 13:01:29 UTC
Permalink
I unfortunately don't know enough about PSPP syntax to suggest how to do
this, but a CI is *not* always associated with a hypothesis and can be
calculated from just a mean and SD (and a cumulative distribution function,
which is typically the normal one). Typically the formula is something like:

mean ± z(SD/sqrt(n)), where z is from the CDF.

On Fri, Oct 12, 2018 at 6:29 AM John Darrington <
Post by John Darrington
The confidence interval is a concept associated with a hypothesis.
If it's the confidence interval on the test for a mean value, typically you
would get that by using a T-Test.
Folks,
I would like to get a 95% confidence interval so that I could use it
in AGGREGATE, e.g.,
AGGREGATE OUTFILE * MODE ADDVARIABLES
/BREAK=...
/Mean = mean(V)
/CI = ci(V, 0.95)
What must I do to get the result of my hypothetical `ci' function?
I'm a PSPP novice, so maybe there is a better solution than AGGREGATE
??? what I ultimately want is to emit the confidence interval of a
variable to a CSV file using SAVE TRANSLATE.
Werner
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
--
Avoid eavesdropping. Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
Alan Mead
2018-10-12 13:13:23 UTC
Permalink
I think John is saying that in SPSS/PSPP you need to use a statistical
function to generate statistical results like a CI. For example, T-TEST
will produce a 95% CI for the mean difference in independent-samples
t-tests. Other routines may provide other confidence intervals.

But maybe you want to use compute to create a variable for each case:

compute LB = x - 1.96 * 10.1/sqrt(31).
compute UB = x + 1.96 * 10.1/sqrt(31).

This creates variables LB and UB for all cases in your data file and you
need to supply Z, mean, SD, and n.

Connecting that to your query about AGGREGATE, which takes groups of
cases (defined by unique values on BREAK) and creates summary stats, you
could use AGGREGATE to create mean, SD and n in the above equation and
then use those compute statements to calculate the bounds of the CI for
the groups of cases. So, you would have to do two steps. First an
AGGREGATE command that creates mean, SD and n, followed by the two
computes above.

You would end up with a dataset containing the mean, sd, n, LB, and UB
for each group (defined by a unique value of BREAK) in the original dataset.

Hopefully you only have one variable (or a very few) in the data that
you want this on, because you have to create mean, SD and n for each
variable.

-Alan
Post by Mark Hancock
I unfortunately don't know enough about PSPP syntax to suggest how to
do this, but a CI is *not* always associated with a hypothesis and can
be calculated from just a mean and SD (and a cumulative distribution
function, which is typically the normal one). Typically the formula is
mean ± z(SD/sqrt(n)), where z is from the CDF.
On Fri, Oct 12, 2018 at 6:29 AM John Darrington
The confidence interval is a concept associated with a hypothesis.
If it's the confidence interval on the test for a mean value, typically you
would get that by using a T-Test.
     Folks,
     I would like to get a 95% confidence interval so that I could
use it
     in AGGREGATE, e.g.,
       AGGREGATE OUTFILE * MODE ADDVARIABLES
         /BREAK=...
         /Mean = mean(V)
         /CI = ci(V, 0.95)
     What must I do to get the result of my hypothetical `ci'
function?
     I'm a PSPP novice, so maybe there is a better solution than
AGGREGATE
     ??? what I ultimately want is to emit the confidence interval
of a
     variable to a CSV file using SAVE TRANSLATE.
         Werner
     _______________________________________________
     Pspp-users mailing list
     https://lists.gnu.org/mailman/listinfo/pspp-users
--
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
--
Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.

science + technology = better workers

http://www.alanmead.org

"You're an interesting species. An interesting mix.
You're capable of such beautiful dreams, and such
horrible nightmares. You feel so lost, so cut off,
so alone, only you're not. See, in all our
searching, the only thing we've found that makes
the emptiness bearable, is each other."

-- Carl Sagan, Contact
Dr. Oliver Walter
2018-10-12 13:22:10 UTC
Permalink
A confidence interval is mathematically equivalent to its corresponding
hypothesis test. The hypothesis test is significant if the corresponding
confidence interval does not contain the parameter value of the null
hypothesis. The confidence interval does not contain the parameter value
of the null hypothesis if the hypothesis test is significant. Hence,
wether you calculate the confidence interval or conduct the hypothesis
test, doesn't really matter.

mean(X) +/- t * sd/sqrt(n): confidence interval for the expected value
of X, mu, X normally distributed with unknown population variance

t = (mean - mÌ0)/ (sd/sqrt(n)) : test statistic for testing if mu equals
the value in the null hypothesis, mu0, X normally distributed with
unknown population variance

If mÌ0 is not contained in the confidence interval, the hypothesis test
is significant.

Dr. Oliver Walter
Post by Mark Hancock
I unfortunately don't know enough about PSPP syntax to suggest how to
do this, but a CI is *not* always associated with a hypothesis and can
be calculated from just a mean and SD (and a cumulative distribution
function, which is typically the normal one). Typically the formula is
mean ± z(SD/sqrt(n)), where z is from the CDF.
On Fri, Oct 12, 2018 at 6:29 AM John Darrington
The confidence interval is a concept associated with a hypothesis.
If it's the confidence interval on the test for a mean value, typically you
would get that by using a T-Test.
     Folks,
     I would like to get a 95% confidence interval so that I could
use it
     in AGGREGATE, e.g.,
       AGGREGATE OUTFILE * MODE ADDVARIABLES
         /BREAK=...
         /Mean = mean(V)
         /CI = ci(V, 0.95)
     What must I do to get the result of my hypothetical `ci'
function?
     I'm a PSPP novice, so maybe there is a better solution than
AGGREGATE
     ??? what I ultimately want is to emit the confidence interval
of a
     variable to a CSV file using SAVE TRANSLATE.
         Werner
     _______________________________________________
     Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
--
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
Mark Hancock
2018-10-12 14:56:56 UTC
Permalink
This is a good point, yes. I'm not the original requester, but I think they
were really asking for a simple way to get a CI when reporting
summary/descriptive statistics (without having a second mean to compare
to). In SPSS you can do this:
https://en.wikibooks.org/wiki/Using_SPSS_and_PASW/Confidence_Intervals

Maybe this is just my misunderstanding of AGGREGATE and PSPP syntax, but my
point was just that there's nothing inherent about the question that should
require a t-test - i.e., you can use z by default (and t-tests are really
just extensions of z-scores anyway). z=1.96 works for 95% CIs, and Alan's
suggestion does what I think the original requester was asking.

Pointing to t-tests isn't a bad idea either, though, and maybe providing
syntax for how to reduce it to a z-score would help the original requester
(though I don't think they have another mean or value to compare it to).

On Fri, Oct 12, 2018 at 9:33 AM Dr. Oliver Walter <
Post by Dr. Oliver Walter
A confidence interval is mathematically equivalent to its corresponding
hypothesis test. The hypothesis test is significant if the corresponding
confidence interval does not contain the parameter value of the null
hypothesis. The confidence interval does not contain the parameter value of
the null hypothesis if the hypothesis test is significant. Hence, wether
you calculate the confidence interval or conduct the hypothesis test,
doesn't really matter.
mean(X) +/- t * sd/sqrt(n): confidence interval for the expected value of
X, mu, X normally distributed with unknown population variance
t = (mean - mÌ0)/ (sd/sqrt(n)) : test statistic for testing if mu equals
the value in the null hypothesis, mu0, X normally distributed with unknown
population variance
If mÌ0 is not contained in the confidence interval, the hypothesis test is
significant.
Dr. Oliver Walter
I unfortunately don't know enough about PSPP syntax to suggest how to do
this, but a CI is *not* always associated with a hypothesis and can be
calculated from just a mean and SD (and a cumulative distribution function,
mean ± z(SD/sqrt(n)), where z is from the CDF.
On Fri, Oct 12, 2018 at 6:29 AM John Darrington <
Post by John Darrington
The confidence interval is a concept associated with a hypothesis.
If it's the confidence interval on the test for a mean value, typically you
would get that by using a T-Test.
Folks,
I would like to get a 95% confidence interval so that I could use it
in AGGREGATE, e.g.,
AGGREGATE OUTFILE * MODE ADDVARIABLES
/BREAK=...
/Mean = mean(V)
/CI = ci(V, 0.95)
What must I do to get the result of my hypothetical `ci' function?
I'm a PSPP novice, so maybe there is a better solution than AGGREGATE
??? what I ultimately want is to emit the confidence interval of a
variable to a CSV file using SAVE TRANSLATE.
Werner
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
--
Avoid eavesdropping. Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
_______________________________________________
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
Dr. Oliver Walter
2018-10-12 15:16:52 UTC
Permalink
I just responded to your statements about the relations between CIs and
hypothesis test that a CI is *not* always associated with a hypothesis.
The equations I mentioned were only examples for a confidence interval
and its equivalent hypothesis test.

BTW:  It's not safe to always use z instead of t. If your sample size is
small and you don't know the population variance, it's better to use t
instead of z.
Post by Mark Hancock
This is a good point, yes. I'm not the original requester, but I think
they were really asking for a simple way to get a CI when reporting
summary/descriptive statistics (without having a second mean to
https://en.wikibooks.org/wiki/Using_SPSS_and_PASW/Confidence_Intervals
Maybe this is just my misunderstanding of AGGREGATE and PSPP syntax,
but my point was just that there's nothing inherent about the question
that should require a t-test - i.e., you can use z by default (and
t-tests are really just extensions of z-scores anyway). z=1.96 works
for 95% CIs, and Alan's suggestion does what I think the original
requester was asking.
Pointing to t-tests isn't a bad idea either, though, and maybe
providing syntax for how to reduce it to a z-score would help the
original requester (though I don't think they have another mean or
value to compare it to).
On Fri, Oct 12, 2018 at 9:33 AM Dr. Oliver Walter
A confidence interval is mathematically equivalent to its
corresponding hypothesis test. The hypothesis test is significant
if the corresponding confidence interval does not contain the
parameter value of the null hypothesis. The confidence interval
does not contain the parameter value of the null hypothesis if the
hypothesis test is significant. Hence, wether you calculate the
confidence interval or conduct the hypothesis test, doesn't really
matter.
mean(X) +/- t * sd/sqrt(n): confidence interval for the expected
value of X, mu, X normally distributed with unknown population
variance
t = (mean - mÌ0)/ (sd/sqrt(n)) : test statistic for testing if mu
equals the value in the null hypothesis, mu0, X normally
distributed with unknown population variance
If mÌ0 is not contained in the confidence interval, the hypothesis
test is significant.
Dr. Oliver Walter
Post by Mark Hancock
I unfortunately don't know enough about PSPP syntax to suggest
how to do this, but a CI is *not* always associated with a
hypothesis and can be calculated from just a mean and SD (and a
cumulative distribution function, which is typically the normal
mean ± z(SD/sqrt(n)), where z is from the CDF.
On Fri, Oct 12, 2018 at 6:29 AM John Darrington
The confidence interval is a concept associated with a hypothesis.
If it's the confidence interval on the test for a mean value,
typically you
would get that by using a T-Test.
     Folks,
     I would like to get a 95% confidence interval so that I
could use it
     in AGGREGATE, e.g.,
       AGGREGATE OUTFILE * MODE ADDVARIABLES
         /BREAK=...
         /Mean = mean(V)
         /CI = ci(V, 0.95)
     What must I do to get the result of my hypothetical `ci'
function?
     I'm a PSPP novice, so maybe there is a better solution
than AGGREGATE
     ??? what I ultimately want is to emit the confidence
interval of a
     variable to a CSV file using SAVE TRANSLATE.
         Werner
     _______________________________________________
     Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
--
Avoid eavesdropping.  Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
Werner LEMBERG
2018-10-12 19:48:04 UTC
Permalink
Post by Dr. Oliver Walter
I just responded to your statements about the relations between CIs
and hypothesis test that a CI is *not* always associated with a
hypothesis. The equations I mentioned were only examples for a
confidence interval and its equivalent hypothesis test. [...]
Thanks a lot to all who have responded. I must add that I'm not only
a PSPP novice, my knowledge of (and, admittedly, my interest in)
statistics in general is very small. I'm basically looking for
statistical recipes that I can apply.

What I actually want to do is to replace the mediocre PDF output of
SPSS (which my daughter was using to create images for her MD thesis –
I don't know SPSS and she has no longer access to it, so there is no
chance to improve the created bar diagrams directly) with the route

.SAV file -> PSPP -> CSV -> LaTeX pgfplots

mainly to be able to recreate the diagrams programmatically, without
using a GUI (I guess that R would be probably a better choice for that
purpose, but I don't know this either).

In other words I'm just poking with a stick in the dark...

Here is the command that her group was using on SPSS.

GLM Var1 Var2 Var3 BY Group WITH Age
/WSFACTOR=Location 3 Polynomial
/METHOD=SSTYPE(3)
/PLOT=PROFILE(Group*Location)
TYPE=BAR ERRORBAR=CI MEANREFERENCE=NO
/PRINT=DESCRIPTIVE ETASQ
/CRITERIA=ALPHA(.05)
/WSDESIGN=Location
/DESIGN=Age Group.

[There are two groups, with approx 15 cases each.]

The diagram shows bars for the mean values of Var{1,2,3} together with
error bars indicating the CI.


Werner
Dr. Oliver Walter
2018-10-12 20:08:06 UTC
Permalink
It seems to be a mixed ANCOVA with a within-subjects factor called
"Location", a between-subjects factor called "Group" and a covariate
"Age". I think that the GLM command in PSPP is not able to compute such
an analysis. GLM can only compute between-subjects designs in PSPP (cf.
PSPP manual, p. 143).
Post by Werner LEMBERG
Post by Dr. Oliver Walter
I just responded to your statements about the relations between CIs
and hypothesis test that a CI is *not* always associated with a
hypothesis. The equations I mentioned were only examples for a
confidence interval and its equivalent hypothesis test. [...]
Thanks a lot to all who have responded. I must add that I'm not only
a PSPP novice, my knowledge of (and, admittedly, my interest in)
statistics in general is very small. I'm basically looking for
statistical recipes that I can apply.
What I actually want to do is to replace the mediocre PDF output of
SPSS (which my daughter was using to create images for her MD thesis –
I don't know SPSS and she has no longer access to it, so there is no
chance to improve the created bar diagrams directly) with the route
.SAV file -> PSPP -> CSV -> LaTeX pgfplots
mainly to be able to recreate the diagrams programmatically, without
using a GUI (I guess that R would be probably a better choice for that
purpose, but I don't know this either).
In other words I'm just poking with a stick in the dark...
Here is the command that her group was using on SPSS.
GLM Var1 Var2 Var3 BY Group WITH Age
/WSFACTOR=Location 3 Polynomial
/METHOD=SSTYPE(3)
/PLOT=PROFILE(Group*Location)
TYPE=BAR ERRORBAR=CI MEANREFERENCE=NO
/PRINT=DESCRIPTIVE ETASQ
/CRITERIA=ALPHA(.05)
/WSDESIGN=Location
/DESIGN=Age Group.
[There are two groups, with approx 15 cases each.]
The diagram shows bars for the mean values of Var{1,2,3} together with
error bars indicating the CI.
Werner
Werner LEMBERG
2018-10-13 05:24:15 UTC
Permalink
Post by Dr. Oliver Walter
It seems to be a mixed ANCOVA with a within-subjects factor called
"Location", a between-subjects factor called "Group" and a covariate
"Age". I think that the GLM command in PSPP is not able to compute
such an analysis. GLM can only compute between-subjects designs in
PSPP (cf. PSPP manual, p. 143).
Yes. As mentioned before, however, I'm not interested in the complete
analysis but only in reproducing the bar diagram. Note that the
printed SPSS report doesn't ever mention a CI value; it is only used
for the error bar in the diagram – which are huge in this particular
case, BTW. I hope it is safe to assume that the error bars can be
approximated by the simplest approach (i.e., based on a model for few
samples) in case SPSS uses different values.

So: Is there a command in PSPP that directly gives me a standard CI
value? Or does I have to cook it up using the formula already
mentioned in this thread?


Werner
Dr. Oliver Walter
2018-10-13 06:36:22 UTC
Permalink
I don't think that PSPP can produce bar charts with confidence intervals
or something similar (bar charts for means aren't the best idea
anyway).  I think it is only possible to split the data file to compare
groups and then calculate confidence intervals for the mean for these
groups.

Command:

SORT CASES BY var1 [var2].
SPLIT FILE LAYERED BY  var1 [var2].

T-TEST /TESTVAL=0
    /VARIABLES= dependent variables    /MISSING=ANALYSIS
    /CRITERIA=CI(insert your confidence level here, e.g. 0.95).

Then you can use the means and the bounds of the confidence intervals to
draw your own graphs outside PSPP. I think that this is the better way
because the quality of the charts is not very good in PSPP and I cannot
recommend them to be used in any thesis or research paper. Or you could
use another statistical software which is able to do what you need.
Post by Werner LEMBERG
Post by Dr. Oliver Walter
It seems to be a mixed ANCOVA with a within-subjects factor called
"Location", a between-subjects factor called "Group" and a covariate
"Age". I think that the GLM command in PSPP is not able to compute
such an analysis. GLM can only compute between-subjects designs in
PSPP (cf. PSPP manual, p. 143).
Yes. As mentioned before, however, I'm not interested in the complete
analysis but only in reproducing the bar diagram. Note that the
printed SPSS report doesn't ever mention a CI value; it is only used
for the error bar in the diagram – which are huge in this particular
case, BTW. I hope it is safe to assume that the error bars can be
approximated by the simplest approach (i.e., based on a model for few
samples) in case SPSS uses different values.
So: Is there a command in PSPP that directly gives me a standard CI
value? Or does I have to cook it up using the formula already
mentioned in this thread?
Werner
Werner LEMBERG
2018-10-13 14:04:41 UTC
Permalink
Post by Dr. Oliver Walter
SORT CASES BY var1 [var2].
SPLIT FILE LAYERED BY  var1 [var2].
T-TEST /TESTVAL=0
    /VARIABLES= dependent variables    /MISSING=ANALYSIS
    /CRITERIA=CI(insert your confidence level here, e.g. 0.95).
Very nice, thanks!
Post by Dr. Oliver Walter
Then you can use the means and the bounds of the confidence
intervals to draw your own graphs outside PSPP.
How do I get the results of the T-TEST command shown on the console
into the current dataset so that I can export them to a CSV file?


Werner
Dr. Oliver Walter
2018-10-13 14:13:34 UTC
Permalink
The results of any analysis are printed in the PSPP output and are
generally not saved in the dataset.
Post by Werner LEMBERG
Post by Dr. Oliver Walter
SORT CASES BY var1 [var2].
SPLIT FILE LAYERED BY  var1 [var2].
T-TEST /TESTVAL=0
    /VARIABLES= dependent variables    /MISSING=ANALYSIS
    /CRITERIA=CI(insert your confidence level here, e.g. 0.95).
Very nice, thanks!
Post by Dr. Oliver Walter
Then you can use the means and the bounds of the confidence
intervals to draw your own graphs outside PSPP.
How do I get the results of the T-TEST command shown on the console
into the current dataset so that I can export them to a CSV file?
Werner
John Darrington
2018-10-14 06:46:02 UTC
Permalink
Post by Dr. Oliver Walter
SORT CASES BY var1 [var2].
SPLIT FILE LAYERED BY?? var1 [var2].
T-TEST /TESTVAL=0
?????? /VARIABLES= dependent variables ???? /MISSING=ANALYSIS
?????? /CRITERIA=CI(insert your confidence level here, e.g. 0.95).
Very nice, thanks!
Post by Dr. Oliver Walter
Then you can use the means and the bounds of the confidence
intervals to draw your own graphs outside PSPP.
How do I get the results of the T-TEST command shown on the console
into the current dataset so that I can export them to a CSV file?


If I understand the use case properly, I think that you can do what you
want with with an aggregate followed by a few simple compute commands:


AGGREGATE OUTFILE * MODE ADDVARIABLES
/BREAK=g
/Mean = mean(V)
/sd = sd(v)
/n = n(v)
.

compute ci_upper=mean + sd/sqrt(n).
compute ci_lower=mean - sd/sqrt(n).

list.

You might want to check out the continuous distribution functions too.
See section 7.7.10 of the user manual.
--
Avoid eavesdropping. Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
Werner LEMBERG
2018-10-14 07:04:44 UTC
Permalink
Post by John Darrington
If I understand the use case properly, I think that you can do what
you want with with an aggregate followed by a few simple compute
commands: [...]
Thanks!


Werner
Dr. Oliver Walter
2018-10-14 07:28:47 UTC
Permalink
Post by Werner LEMBERG
AGGREGATE OUTFILE * MODE ADDVARIABLES
/BREAK=g
/Mean = mean(V)
/sd = sd(v)
/n = n(v)
.
compute ci_upper=mean + sd/sqrt(n).
compute ci_lower=mean - sd/sqrt(n).
list.
Sorry for interrupting, but this doesn't give a 95% (or 90%) CI, but
only mean +/- one standard error which is a 68%-CI if X is normally
distributed and sd equals the population variance or an approximate 68%
CI if the sample size goes to infinity (is large). You have to include a
t value into the equation for calculating a 95% (or 90%) CI. If your
sample sizes are small and differ from each other you should use
different t values for each CI and each group. If you sample size is
large you could use one z value (1.96) for all groups, but this is not
appropriate in this case (n1 = n2 = 15, sample sizes are too small for
this standard normal approximation).
John Darrington
2018-10-14 07:41:12 UTC
Permalink
Post by Werner LEMBERG
AGGREGATE OUTFILE * MODE ADDVARIABLES
/BREAK=g
/Mean = mean(V)
/sd = sd(v)
/n = n(v)
.
compute ci_upper=mean + sd/sqrt(n).
compute ci_lower=mean - sd/sqrt(n).
list.
Sorry for interrupting, but this doesn't give a 95% (or 90%) CI, but only
mean +/- one standard error which is a 68%-CI if X is normally
distributed and sd equals the population variance or an approximate 68%
CI if the sample size goes to infinity (is large). You have to include a
t value into the equation for calculating a 95% (or 90%) CI. If your
sample sizes are small and differ from each other you should use
different t values for each CI and each group. If you sample size is
large you could use one z value (1.96) for all groups, but this is not
appropriate in this case (n1 = n2 = 15, sample sizes are too small for
this standard normal approximation).

You are right. Which is why I suggested using one of the CDF functions.
There is no T function, but there is a F function, which I think is the
same if you set DF2 to 1. But you probably know better than me about
those details. Perhaps IDF.F (0.05, N -1, 1) is what Werner wants (I
haven't tried it)?


J'
--
Avoid eavesdropping. Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
Dr. Oliver Walter
2018-10-14 08:07:09 UTC
Permalink
Post by John Darrington
Which is why I suggested using one of the CDF functions.
There is no T function, but there is a F function, which I think is the
same if you set DF2 to 1. But you probably know better than me about
those details. Perhaps IDF.F (0.05, N -1, 1) is what Werner wants (I
haven't tried it)?
Ok. I see. I think the t distribution is implemented in PSPP.
IDF.T(0.975, n-1) gives the correct t value for a 95% CI. Werner could
try it.
John Darrington
2018-10-14 15:54:17 UTC
Permalink
Post by John Darrington
Which is why I suggested using one of the CDF functions.
There is no T function, but there is a F function, which I think is the
same if you set DF2 to 1. But you probably know better than me about
those details. Perhaps IDF.F (0.05, N -1, 1) is what Werner wants (I
haven't tried it)?
Ok. I see. I think the t distribution is implemented in PSPP.
IDF.T(0.975, n-1) gives the correct t value for a 95% CI. Werner could
try it.

Oh you're right. There is a IDF.T function after all, so it's even
easier.

J'
--
Avoid eavesdropping. Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
Dave Trollope
2018-10-16 17:09:32 UTC
Permalink
Hi,

We are using the following script to generate CSV's for a subset of
variables in an SAV file and it looks like pspp is loading the entire
SAV file in to memory. (Some of our SAV files are quite large - 4GB).
I'm wondering if there is an option or a way we can reduce the memory
usage when extracting a subset of variables?

GET FILE = "{}"

SAVE TRANSLATE
  /OUTFILE="{}"
  /TYPE=CSV
  /FIELDNAMES
  /REPLACE
  /KEEP={}
  /MISSING=RECODE
  /CELLS=LABELS.

Cheers
Dave
John Darrington
2018-10-16 17:18:35 UTC
Permalink
Yes.

Check out the SET WORKSPACE command.

J'

On Tue, Oct 16, 2018 at 12:09:32PM -0500, Dave Trollope wrote:
Hi,

We are using the following script to generate CSV's for a subset of
variables in an SAV file and it looks like pspp is loading the entire SAV
file in to memory. (Some of our SAV files are quite large - 4GB). I'm
wondering if there is an option or a way we can reduce the memory usage
when extracting a subset of variables?

GET FILE = "{}"

SAVE TRANSLATE
?? /OUTFILE="{}"
?? /TYPE=CSV
?? /FIELDNAMES
?? /REPLACE
?? /KEEP={}
?? /MISSING=RECODE
?? /CELLS=LABELS.

Cheers
Dave

_______________________________________________
Pspp-users mailing list
Pspp-***@gnu.org
https://lists.gnu.org/mailman/listinfo/pspp-users
--
Avoid eavesdropping. Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
Ben Pfaff
2018-10-16 21:10:02 UTC
Permalink
Another way to translate .sav to .csv is to run the "pspp-convert"
utility, without even starting the main PSPP executable.
Post by John Darrington
Yes.
Check out the SET WORKSPACE command.
J'
Hi,
We are using the following script to generate CSV's for a subset of
variables in an SAV file and it looks like pspp is loading the entire SAV
file in to memory. (Some of our SAV files are quite large - 4GB). I'm
wondering if there is an option or a way we can reduce the memory usage
when extracting a subset of variables?
GET FILE = "{}"
SAVE TRANSLATE
?? /OUTFILE="{}"
?? /TYPE=CSV
?? /FIELDNAMES
?? /REPLACE
?? /KEEP={}
?? /MISSING=RECODE
?? /CELLS=LABELS.
Cheers
Dave
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
--
Avoid eavesdropping. Send strong encrypted email.
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
_______________________________________________
Pspp-users mailing list
https://lists.gnu.org/mailman/listinfo/pspp-users
Loading...